Skip to content

bodleian/wacksy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wacksy

Software Heritage Archive Deps.rs Crate Dependencies (latest) Crates.io Total Downloads

An experimental Rust library for reading and writing ᴡᴀᴄᴢ files.

Install

With cargo installed, run the following command in your project directory:

cargo add wacksy

Example

This library provides two main ᴀᴘɪ functions. from_file() takes a ᴡᴀʀᴄ file and returns a structured representation of a ᴡᴀᴄᴢ object. as_zip_archive() takes a ᴡᴀᴄᴢ object and zips it up to a byte array using rawzip.

fn main() -> Result<(), Box<dyn Error>> {
    let warc_file_path = Path::new("example.warc.gz"); // set path to your ᴡᴀʀᴄ file
    let wacz_object = WACZ::from_file(warc_file_path)?; // index the ᴡᴀʀᴄ and create a ᴡᴀᴄᴢ object
    let zipped_wacz: Vec<u8> = wacz_object.as_zip_archive()?; // zip up the ᴡᴀᴄᴢ
    fs::write("example.wacz", zipped_wacz)?; // write out to file
    Ok(())
}

See the documentation for more details.

Background

According to Ed Summers, a ᴡᴀᴄᴢ file is "really just a ᴢɪᴘ file that contains ᴡᴀʀᴄ data and metadata at predicatble file locations."1

The example in the spec outlines what a ᴡᴀᴄᴢ file should contain:

archive
└── data.warc.gz
datapackage.json
datapackage-digest.json
indexes
└── index.cdx.gz
pages
└── pages.jsonl

Similar libraries

License

MIT © Bodleian Libraries and contributors

Footnotes

  1. For more discussion of the concept, see the talk "Web Archives in Digital Repositories" by Ilya Kremer and Ed Summers at Code4Lib 2022.

About

An experimental library for writing WACZ files

Topics

Resources

License

Stars

Watchers

Forks

Contributors 5

Languages