A CLI and microservice to fetch URLs and render them as:
- HTML: Rendered page content
- Markdown: Converted from HTML
- PNG screenshot: Full page capture
This repository contains two implementations with compatible APIs:
| Implementation | Directory | Package | Status |
|---|---|---|---|
| JavaScript/Node.js | ./js |
@link-assistant/web-capture | Production |
| Rust | ./rust |
web-capture | Production |
Both implementations provide the same CLI interface and HTTP API endpoints, allowing you to choose based on your deployment preferences.
cd js
npm install
npm run devcd rust
cargo run -- --serveBoth implementations expose the same API:
| Endpoint | Description |
|---|---|
GET /html?url=<URL> |
Get rendered HTML content |
GET /markdown?url=<URL> |
Get Markdown conversion |
GET /image?url=<URL> |
Get PNG screenshot |
GET /fetch?url=<URL> |
Proxy fetch content |
GET /stream?url=<URL> |
Stream content |
# Capture a URL as HTML (output to stdout)
web-capture https://example.com
# Capture as Markdown and save to file
web-capture https://example.com --format markdown --output page.md
# Take a screenshot
web-capture https://example.com --format png --output screenshot.png
# Start as API server
web-capture --serve
# Start server on custom port
web-capture --serve --port 8080| Option | Short | Description | Default |
|---|---|---|---|
--serve |
-s |
Start as HTTP API server | - |
--port |
-p |
Port to listen on | 3000 |
--format |
-f |
Output format: html, markdown/md, image/png |
html |
--output |
-o |
Output file path | stdout (text) or auto-generated (images) |
--engine |
-e |
Browser engine (JS only): puppeteer, playwright |
puppeteer |
cd js
docker build -t web-capture-js .
docker run -p 3000:3000 web-capture-jscd rust
docker build -t web-capture-rust .
docker run -p 3000:3000 web-capture-rustweb-capture/
├── js/ # JavaScript/Node.js implementation
│ ├── src/ # Source code
│ ├── bin/ # CLI entry point
│ ├── tests/ # Test files
│ ├── examples/ # Usage examples
│ ├── package.json # npm package manifest
│ ├── Dockerfile # Docker build file
│ └── README.md # JavaScript-specific docs
│
├── rust/ # Rust implementation
│ ├── src/ # Source code
│ │ ├── lib.rs # Library exports
│ │ ├── main.rs # CLI/server entry point
│ │ ├── browser.rs # Browser automation
│ │ ├── html.rs # HTML processing
│ │ └── markdown.rs # Markdown conversion
│ ├── tests/ # Test files
│ ├── examples/ # Usage examples
│ ├── Cargo.toml # Cargo package manifest
│ ├── Dockerfile # Docker build file
│ └── README.md # Rust-specific docs
│
├── scripts/ # Shared build/release scripts
│ ├── *.mjs # JavaScript-specific scripts
│ └── rust-*.mjs # Rust-specific scripts
│
├── .github/workflows/
│ ├── js.yml # JavaScript CI/CD
│ └── rust.yml # Rust CI/CD
│
└── README.md # This file
cd js
npm install
npm run dev # Start dev server
npm test # Run tests
npm run lint # Run lintercd rust
cargo build # Build
cargo test # Run tests
cargo clippy # Run linter
cargo fmt # Format code- HTML Rendering: Fetch and render HTML with JavaScript support via headless browsers
- Markdown Conversion: Clean HTML-to-Markdown conversion with proper formatting
- Screenshots: Capture PNG screenshots of web pages
- URL Normalization: Convert relative URLs to absolute
- Encoding Detection: Automatic charset detection and UTF-8 conversion
- Proxy Support: Fetch and stream content through the service
The JavaScript implementation supports two browser engines:
- Puppeteer (default): Mature, well-tested Chrome automation
- Playwright: Cross-browser automation with similar capabilities
The Rust implementation uses:
- browser-commander: A Rust crate for browser automation using chromiumoxide
Unlicense — This is free and unencumbered software released into the public domain. You are free to copy, modify, publish, use, compile, sell, or distribute this software for any purpose, commercial or non-commercial, and by any means. See https://unlicense.org for details.
- browser-commander - Browser automation library used in Rust implementation
- turndown - HTML to Markdown converter used in JS implementation
- html2md - HTML to Markdown converter used in Rust implementation