Crawling content by wget


Here we will find how to use wget to scrape a target site. The process creates a mirror of the content on the local disk. you can use the tree utility to show the directory structure.

Crawling content by wget

wget -r -m -nv example.com

Show the directory structure

tree

Useful grep search patterns

grep -r -i '<script'
grep -r -i '<script type="text/javascript" src="'
grep -r -i 'type=hidden'
grep -r '<!--'