An experimental Rust library for reading and writing ᴡᴀᴄᴢ files.
With cargo installed, run the following command in your project directory:
cargo add wacksy
This library provides two main ᴀᴘɪ functions.
from_file() takes a ᴡᴀʀᴄ file and returns a structured representation of a ᴡᴀᴄᴢ object.
as_zip_archive() takes a ᴡᴀᴄᴢ object and zips it up to a byte array using rawzip.
fn main() -> Result<(), Box<dyn Error>> {
let warc_file_path = Path::new("example.warc.gz"); // set path to your ᴡᴀʀᴄ file
let wacz_object = WACZ::from_file(warc_file_path)?; // index the ᴡᴀʀᴄ and create a ᴡᴀᴄᴢ object
let zipped_wacz: Vec<u8> = wacz_object.as_zip_archive()?; // zip up the ᴡᴀᴄᴢ
fs::write("example.wacz", zipped_wacz)?; // write out to file
Ok(())
}See the documentation for more details.
According to Ed Summers, a ᴡᴀᴄᴢ file is "really just a ᴢɪᴘ file that contains ᴡᴀʀᴄ data and metadata at predicatble file locations."1
The example in the spec outlines what a ᴡᴀᴄᴢ file should contain:
archive
└── data.warc.gz
datapackage.json
datapackage-digest.json
indexes
└── index.cdx.gz
pages
└── pages.jsonl
MIT © Bodleian Libraries and contributors
Footnotes
-
For more discussion of the concept, see the talk "Web Archives in Digital Repositories" by Ilya Kremer and Ed Summers at Code4Lib 2022. ↩