Using wget to download websites
Using wget to download websites
I know. There are some GUI tools for this. But what if you are stuck in a terminal only environment? Besides, nothing beats plain old terminal.
The simplest form is
wget -r [url] or 
wget --recursive [url]
--recursive: Turn on recursive retrieving. The default maximum depth is 5
Now before you go downloading the internet exercise a little patience, let's cover some more basics.
Using the above command, the url will be downloaded, but will not be really suitable for offline viewing. The links in the downloaded document(s) will still point to the internet. To enable relative (offline) links do this
wget -rk [url] or
wget --recursive --convert-links [url]
--convert-links: After the download is complete, convert the links in the document(s) to make them suitable for local viewing.
The above command will alter the document(s) for offline viewing. You might want wget to keep the original files. Then do this
wget -rkK [url] or
wget --recursive --convert-links --backup-converted [url]
--backup-converted: When converting a file, back up the original version with a .orig suffix
The above command(s) will only download the html file. To tell wget to download all files necessary to display the page properly (images, sounds, linked css etc) use
wget -rkp [url] or
wget --recursive --convert-links --backup-converted --page-requisites [url]
--page-requisites: This option causes wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.
Again, don’t go yet. The default level of links to follow is 5. This might be too much (or too small in case your plan is to download the whole internets). you can specify the link level thus
wget -rkpl 3 [url] or
wget --recursive --convert-links --backup-converted --page-requisites --level=3 [url]
--level=depth: Specify recursion maximum depth level depth.
Finally, you might want wget to do all the hard work of downloading the internet and delete the files immediately after.
wget -r –delete-after url
This is not all wget can do though, to learn more about the various options check the man pages for wget
man wget
That’s it folks. Happy interwebs downloading.
Congratulations @aardvocate! You have completed some achievement on Steemit and have been rewarded with new badge(s) :
Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here
If you no longer want to receive notifications, reply to this comment with the word
STOP