A powerful Python tool for downloading images directly from websites without needing CSV files. This tool can extract image URLs from any website and download them concurrently with robust error handling and progress tracking.
- Direct Website Scraping: Extract image URLs directly from any website
- Concurrent Downloads: Download multiple images simultaneously for efficiency
- Robust Error Handling: Automatic retries for failed downloads
- Progress Tracking: Visual progress bars and real-time statistics
- Flexible Filtering: Filter images by size, extension, or custom patterns
- Resumable Downloads: Continue interrupted downloads where you left off
- JavaScript Support: Option to use Selenium for JavaScript-heavy websites
- Throttling: Control download speed to avoid overwhelming servers
- Comprehensive Logging: Detailed logs for troubleshooting
- Python 3.6 or higher
- Required Python packages:
- requests
- beautifulsoup4
- pillow
- tqdm
- selenium (optional, for JavaScript-heavy websites)
pip install requests beautifulsoup4 pillow tqdm
# Optional for JavaScript-heavy websites
pip install selenium webdriver-managerTo download all images from a website:
python image_downloader.py https://example.comThis will download all images to a directory named downloaded_images in the current working directory.
-o, --output-dir: Directory to save downloaded images (default:downloaded_images)--create-subdir: Create a subdirectory based on the website domain
--min-size: Minimum image size in bytes (default: 0)--extensions: Comma-separated list of allowed image extensions (default:.jpg,.jpeg,.png,.gif,.webp)--include-regex: Regular expression pattern that image URLs must match--exclude-regex: Regular expression pattern that image URLs must not match
-w, --workers: Number of concurrent download workers (default: 10)--timeout: Connection timeout in seconds (default: 30)--retries: Number of retry attempts for failed downloads (default: 3)--throttle: Throttle downloads to N requests per second (0 for no limit)--no-resume: Do not resume from previous download state--retry-failed: Retry previously failed downloads
--use-selenium: Use Selenium for JavaScript-heavy websites--wait-time: Wait time in seconds for JavaScript to load (default: 5)
--no-progress: Do not show progress bar-q, --quiet: Quiet mode (minimal output)-v, --verbose: Verbose mode (detailed output)
python image_downloader.py https://example.compython image_downloader.py https://example.com -o my_imagespython image_downloader.py https://example.com --extensions .jpg,.pngpython image_downloader.py https://example.com --min-size 10240python image_downloader.py https://example.com -w 20python image_downloader.py https://example.com --use-seleniumpython image_downloader.py https://example.com --throttle 2python image_downloader.py https://example.com --include-regex "product|large"python image_downloader.py https://example.com --retry-failedIf no images are found:
- Check if the website uses JavaScript to load images. Try using the
--use-seleniumoption. - Check if images are loaded from a different domain. The tool only extracts images from the specified website by default.
If downloads fail:
- Check your internet connection
- Try increasing the timeout with
--timeout 60 - Some websites may block automated downloads. Try using the
--throttleoption to slow down requests.
If using Selenium:
- Ensure you have Chrome installed
- Try increasing the wait time with
--wait-time 10
For detailed logs, use the verbose mode:
python image_downloader.py https://example.com -vLogs are also saved to image_downloader.log in the current directory.
This project is licensed under the MIT License.