Command-line wrapper around Mistral OCR that:
- Runs OCR on PDFs, images, or URLs (
mistral-ocr-latest) - Writes Markdown (with page images) and the raw JSON response
- Optionally produces strict, structured JSON via a follow-up LLM call (e.g.,
pixtral-12b-latest) with either a generic JSON object or validation against your JSON Schema - Built with Typer and uses
uvas the package manager
-
Inputs: local files or URLs (PDF, PNG/JPG/WebP/TIFF, etc.)
-
Outputs:
doc.md— stitched per-page Markdown; page images saved alongsideraw.json— OCR API responsestructured.json— LLM-produced structured JSON (optional)
-
Controls:
- Select pages:
--pages 0,2,5 - Include/omit base64 images inside OCR JSON
- Provide a custom prompt or a JSON Schema for strict structured output
- Select pages:
-
Python 3.10+
-
uvinstalled -
Mistral API key in
MISTRAL_API_KEY- macOS/Linux:
export MISTRAL_API_KEY=sk-... - Windows (PowerShell):
$env:MISTRAL_API_KEY="sk-..."
- macOS/Linux:
git clone <this-repo-url>.git
cd <repo>
# ensure your API key is set
export MISTRAL_API_KEY=sk-...
# run the CLI (uv will resolve deps automatically)
uv run uni-ocr --helpCommon one-liner from this repo:
uv run uni-ocr structured ./samples/adhaar.jpg --with-image --llm-model pixtral-12b-latest > result.jsonLocal PDF:
uv run uni-ocr ocr ./samples/sample.pdf --out ocr_outLocal image:
uv run uni-ocr ocr ./samples/id.jpg --out ocr_imgRemote URL (auto-detects image vs document):
uv run uni-ocr ocr "https://example.com/file.pdf" --out ocr_urlProcess specific pages (zero-based):
uv run uni-ocr ocr ./samples/long.pdf --pages 0,3,5Exclude base64 images from the OCR JSON:
uv run uni-ocr ocr ./samples/id.jpg --no-imagesOutputs:
<out>/
├── doc.md
├── raw.json
├── page-000-img-00.jpg
└── page-000-img-01.png
Vision model (includes the image in the chat request):
uv run uni-ocr structured ./samples/adhaar.jpg --with-image --llm-model pixtral-12b-latest > result.jsonText-only model (Markdown only):
uv run uni-ocr structured ./samples/id.jpg --no-image --llm-model ministral-8b-latest > result.jsonCustom prompt:
uv run uni-ocr structured ./samples/id.jpg \
--with-image \
--prompt "Return JSON with keys: DOC_TYPE, ISSUER, PEOPLE[], DATES[] only."Output:
ocr_out_structured/
└── structured.json
Create schema.json:
{
"type": "object",
"properties": {
"DOC_TYPE": { "type": "string" },
"ISSUER": { "type": "string" },
"ENTITIES": { "type": "array", "items": { "type": "string" } }
},
"required": ["DOC_TYPE"]
}Run:
uv run uni-ocr structured ./samples/id.jpg \
--rf json_schema \
--schema schema.json \
--with-image \
--llm-model pixtral-12b-latest > structured.json- Use
--no-imagesif you want a smaller, more diff-friendlyraw.json. - For multi-page PDFs,
--pageshelps target just the relevant pages. - Keep your prompt concise and deterministic;
temperature=0is used by default.