wget – Retrieving Files Using HTTP: A Comprehensive Guide

In the world of command-line tools, wget stands out as a powerful, non-interactive utility for downloading files from the web. Whether you need to fetch a single file, mirror an entire website, or automate downloads in scripts, wget is a go-to tool for developers, system administrators, and power users. This blog dives deep into using wget specifically for retrieving files over HTTP, covering basic usage, advanced features, best practices, and troubleshooting. By the end, you’ll be equipped to leverage wget effectively for all your HTTP download needs.

Table of Contents#

  1. What is wget?
  2. Installing wget
  3. Basic HTTP File Retrieval
  4. Advanced HTTP Features
  5. Common Practices
  6. Best Practices
  7. Troubleshooting Common Issues
  8. Conclusion
  9. References

What is wget?#

wget (short for "World Wide Web get") is a free, open-source command-line utility developed by the GNU Project. It is designed to retrieve files from web servers using HTTP, HTTPS, and FTP protocols. Key features include:

  • Non-interactive operation: Runs in the background, making it ideal for scripts and cron jobs.
  • Resumable downloads: Continues interrupted downloads from where they left off.
  • Recursive retrieval: Crawls and downloads entire directories or websites.
  • Bandwidth control: Limits download speed to avoid overwhelming servers or networks.
  • Flexible configuration: Supports custom headers, user authentication, cookies, and more.

wget is preinstalled on most Linux systems. On macOS, only curl is preinstalled by default, and wget requires manual installation. It is also available for Windows. Its simplicity and versatility make it a staple for web-related tasks.

Note on GNU Wget2: GNU Wget2 is currently under active development as the successor to Wget. It adds features such as multi-threaded downloads, HTTP/2 support, HTTP compression, and improved performance. Some Linux distributions (e.g., Fedora) have experimented with replacing wget with Wget2, though Wget remains the widely used standard. For the latest status, see the Wget2 project page.

Installing wget#

If wget isn’t already installed on your system, use the following commands to install it:

Linux (Debian/Ubuntu)#

sudo apt update && sudo apt install wget

Linux (RHEL/CentOS/Fedora)#

sudo dnf install wget   # For RHEL 8+, CentOS Stream, Fedora
# Or for older RHEL/CentOS 7 (EOL):
sudo yum install wget

macOS#

Using Homebrew:

brew install wget

Windows#

  • Chocolatey: choco install wget
  • Scoop: scoop install wget
  • Download the binary directly from the GNU wget website.

Basic HTTP File Retrieval#

The simplest use case for wget is downloading a single file from an HTTP URL. The syntax is:

wget [URL]

Example: Download a Text File#

To download a sample text file from http://example.com/file.txt:

wget http://example.com/file.txt

Output Explanation:#

--2026-06-08 10:00:00--  http://example.com/file.txt
Resolving example.com (example.com)... 93.184.216.34
Connecting to example.com (example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1234 (1.2K) [text/plain]
Saving to: ‘file.txt’

file.txt           100%[===================>]   1.20K  --.-KB/s    in 0s      

2026-06-08 10:00:01 (12.3 MB/s) - ‘file.txt’ saved [1234/1234]
  • Resolving example.com: wget resolves the domain to an IP address.
  • Connecting to...: Establishes a TCP connection to port 80 (HTTP).
  • HTTP request sent... 200 OK: The server returns a success status code.
  • Saving to: ‘file.txt’: The file is saved with the same name as the URL’s filename.

Advanced HTTP Features#

wget offers a rich set of flags to handle complex HTTP scenarios. Below are key features with examples.

Resuming Interrupted Downloads#

Use -c (or --continue) to resume a partially downloaded file. This is critical for large files or unstable connections.

Example: Resume a failed download of large-file.iso:

wget -c http://example.com/large-file.iso

wget checks the local file size and requests the remaining bytes from the server (requires the server to support Range requests).

Specifying Output Filenames#

By default, wget saves files using the filename from the URL. Use -O (or --output-document) to override this.

Example: Save image.jpg as vacation-photo.jpg:

wget -O vacation-photo.jpg http://example.com/image.jpg

⚠️ Note: Always use -O with a filename, not a directory. To save to a directory, use -P (e.g., wget -P ./downloads http://example.com/file.txt).

Limiting Bandwidth#

Prevent wget from consuming all available bandwidth with --limit-rate. Units: k (kilobytes), m (megabytes).

Example: Limit download speed to 1MB/s:

wget --limit-rate=1m http://example.com/large-file.zip

Custom User-Agents#

Some servers block requests from default wget user-agents (e.g., Wget/1.25). Use -U (or --user-agent) to mimic a browser.

Example: Pretend to be Chrome:

wget -U "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36" http://example.com/protected-page.html

Sending HTTP Headers#

Add custom headers (e.g., Referer, Accept-Language) with --header.

Example: Send a Referer header to bypass hotlink protection:

wget --header "Referer: http://example.com/gallery" http://example.com/secret-image.jpg

HTTP Authentication#

For servers requiring username/password (Basic or Digest auth), use --user and --password.

Example: Download a protected file:

wget --user=myuser --password=mypass http://example.com/secure/file.pdf

POST Requests#

Simulate form submissions with --post-data (for application/x-www-form-urlencoded data) or --post-file (for files).

Example: Submit a login form:

wget --post-data "username=john&password=doe" http://example.com/login

Cookies#

wget can save and load cookies to maintain sessions (e.g., after logging in).

  • Save cookies to a file: --save-cookies cookies.txt
  • Load cookies from a file: --load-cookies cookies.txt

Example: Log in and download a session-protected file:

# Step 1: Log in and save cookies
wget --post-data "user=john&pass=doe" --save-cookies cookies.txt http://example.com/login
 
# Step 2: Use cookies to download a protected file
wget --load-cookies cookies.txt http://example.com/user-data.csv

Recursive Download & Mirroring#

Use -r (or --recursive) to download an entire directory or website. Combine with flags like --mirror for mirroring.

Example: Mirror a website (preserve directory structure, convert links to local, and download prerequisites like CSS/JS):

wget --mirror \
  --convert-links \
  --adjust-extension \
  --page-requisites \
  --no-parent \
  http://example.com/blog/

Control depth: Use -l [depth] (e.g., -l 2 for 2 levels deep).

Common Practices#

1. Download All Files of a Specific Type#

Use -A (accept) to filter by file extensions.

Example: Download all .pdf files from a directory:

wget -r -A pdf http://example.com/documents/

2. Schedule Downloads with cron#

Automate nightly downloads by adding a cron job.

Example: Download a daily backup at 2 AM:

# Edit crontab: crontab -e
0 2 * * * wget -q -O /backups/daily.zip http://example.com/backups/daily.zip

-q (quiet) suppresses output.

3. Test URLs Without Downloading#

Use --spider to check if a URL exists (useful for monitoring).

Example: Verify a file is available:

wget --spider http://example.com/critical-file.txt

Outputs 200 OK if the file exists; 404 Not Found otherwise.

Best Practices#

1. Respect Server Policies#

  • Check robots.txt: Most sites have robots.txt (e.g., http://example.com/robots.txt) to restrict crawlers. Use --robots=off only if explicitly allowed (e.g., your own server).
  • Avoid Overloading Servers: Use --wait=2 (wait 2 seconds between requests) and --random-wait (randomize wait times) to be polite.

2. Security First#

  • Verify SSL Certificates: For HTTPS, wget checks certificates by default. Use --no-check-certificate only for testing (insecure!).
  • Avoid Plaintext Credentials: Instead of --password, use --ask-password to prompt for input (prevents credentials in command history).
  • Keep wget Updated: Versions through 1.21.1 are affected by CVE-2021-31879, which leaks the Authorization header when following redirects to a different origin. Ensure you are running wget 1.21.2 or later.

3. Logging & Debugging#

  • Save Logs: Use --output-file=wget.log to log all activity for debugging.
  • Verbose Mode: Add -v (verbose) or -d (debug) to troubleshoot issues.

4. Clean Up#

  • Use --delete-after to download a file, process it, and then delete it (useful for temporary files).

Troubleshooting Common Issues#

1. 403 Forbidden#

Cause: Server blocks wget’s user-agent or requires a referrer.
Fix: Use a browser user-agent (-U) or add a referrer header (--header "Referer: ...").

2. 404 Not Found#

Cause: Invalid URL or file removed.
Fix: Verify the URL with --spider or check the server’s response.

3. Slow Downloads#

Cause: Server throttling or network congestion.
Fix: Use --limit-rate to avoid triggering throttling, or try --no-cache to bypass local caches.

4. SSL/TLS Errors#

Cause: Expired certificate or missing CA roots.
Fix: Update CA certificates (sudo apt install ca-certificates on Linux) or use --no-check-certificate (insecure, for testing only).

Conclusion#

wget is a versatile tool for HTTP file retrieval, offering simplicity for basic tasks and power for advanced workflows. By mastering its flags—from resuming downloads to mirroring websites—you can automate and streamline web-related tasks efficiently. Remember to use wget responsibly, respecting server policies and security best practices.

References#