Skip to content

yangming-zhang/jsonl-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

jsonl-tools

Work with JSON Lines files from the terminal. No dependencies, no install — just copy the script and run it.

jsonl filter "status>=400" access.log | jsonl select "ts,path,status" | jsonl to-csv > errors.csv

Why

I kept reaching for jq for basic JSONL tasks, but jq's syntax is its own puzzle. This is simpler: one Python file, commands that read like English.

Setup

# Option 1: download and run directly
curl -O https://raw.githubusercontent.com/yangming-zhang/jsonl-tools/main/jsonl.py
python jsonl.py --help

# Option 2: make it a command
curl -o /usr/local/bin/jsonl https://raw.githubusercontent.com/yangming-zhang/jsonl-tools/main/jsonl.py
chmod +x /usr/local/bin/jsonl
jsonl --help

Requires Python 3.9+. That's it.

Commands

head      first N records (default 10)
tail      last N records
count     total record count
keys      field names + how often they appear
filter    keep records matching a condition
select    keep only certain fields
pretty    indented output
to-csv    export to CSV
stats     numeric stats (mean, p95, p99, …)
sample    random sample (reservoir, works on streams)
flatten   collapse nested objects to dot-notation keys
dedup     remove duplicate records

Examples

# How many records?
python jsonl.py count events.jsonl

# Last 20 lines
python jsonl.py tail -n 20 events.jsonl

# HTTP errors only
python jsonl.py filter "status>=400" access.jsonl

# Name matches a pattern
python jsonl.py filter "user.name~=^admin" users.jsonl

# Grab a few fields
python jsonl.py select "id,ts,status" events.jsonl

# Quick stats on response times
python jsonl.py stats duration_ms events.jsonl

# 500 random records (safe on huge files)
python jsonl.py sample 500 huge.jsonl

# Flatten nested JSON, then to CSV
python jsonl.py flatten nested.jsonl | python jsonl.py to-csv > flat.csv

# Drop duplicate events by ID
python jsonl.py dedup -k event_id events.jsonl

# Chain it
cat *.jsonl | python jsonl.py filter "level=error" | python jsonl.py keys

Filter expressions

expr meaning
status=200 equals (string comparison)
status!=404 not equals
msg~=timeout value matches regex
age>=18 numeric ≥
score<0.5 numeric <
user.role=admin nested field (dot-notation)

Notes

  • Reads from stdin if no file is given, so you can pipe freely
  • Bad JSON lines are skipped with a warning, not a crash
  • to-csv buffers the whole file (to collect headers); everything else is streaming

License

MIT

About

CLI utilities for JSONL (JSON Lines) files. Zero dependencies, pure Python stdlib.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages