Wednesday, January 23, 2008

Using wget to download a whole site

Found the following notes and I thought I want to try it.

Recursive options for wget.

wget -r -p -l 2

-r = wget recursively
-p = download all files (incl. images) necessary to render the html pages
-l 2 = descend maximum 2 levels (default is 5)

Another useful option is "-np" or "--no-parent". This will prevent wget from ascending to the parent directory, such as when you want to download
but not everything on

So here goes my attempt to download the whole Quran recited by Muhammad Ayyoub:

wget -r -p -l 2

The above method downloads everything! To keep just the MP3 files:

wget -r -l1 --no-parent -A.mp3

Seems to work. It's takes 24 hours just to get to Surah 26 on an 800 MHz G4 on static IP.

