susm recursively crawls a website, following HTML links, scripts, stylesheets and sitemaps.
For each file it encounters that references or includes a source map
(e.g., JavaScript bundles, CSS files),
it attempts to locate and download that map.
susm then attempts to extract any source code files and write them to disk,
preserving their relative paths as defined in the map.
First, clone the repository:
git clone https://github.com/dixslyf/susm.git
cd susmTo build the scraper, run:
cargo build --releaseThe compiled binary will be available at target/release/susm (assuming Cargo's default target directory).
This repository provides a Nix flake.
To build the scraper with Nix, run:
nix build github:dixslyf/susmTo run the scraper:
nix run github:dixslyf/susmsusm has two primary modes of operation:
-
Crawl a website and unpack discovered source maps:
susm site <URL> [OPTIONS]
-
Unpack a single local source map file:
susm file <PATH> [OPTIONS]
For additional options, run:
susm --helpsusm applies a polite crawling policy by default.
Requests are rate-limited per host to avoid overloading servers.
By default, susm waits 500 milliseconds between requests with a slight random jitter.
The request interval can be adjusted with the --request-interval (-i) flag.
susm also respects the robots.txt exclusion standard.
Before crawling, it retrieves and parses the site’s robots.txt file (if present)
and skips any paths disallowed for its user agent.