Thursday, August 27, 2015

wget - downloading website recursively

So there is this humor & stuff website I like and I wanted to have a push-button approach to getting the posts of the day so I could browse it on the train where the wifi sucks.

Came up with this:

wget --domains=thechive.com,thechive.files.wordpress.com -r -l 1 --restrict-file-names=windows -E -H -k -K -p -e robots=off 'http://thechive.com/'

So all the images get grabbed ..but they have some wonky suffix.
So threw together a shell script to clean em up:

for f in *; do
    z=$(echo "$f" | cut -d"@" -f 1)
    echo "$z"
    mv "$f" "$z"
done

No comments:

Post a Comment