Crawling content by wget

Here we will find how to use wget to scrape a target site. The process creates a mirror of the content on the local disk. you can use the tree utility to show the directory structure.

wget -r -m -nv

Show the directory structure


Useful grep search patterns

grep -r -i '<script'
grep -r -i '<script type="text/javascript" src="'
grep -r -i 'type=hidden'
grep -r '<!--'