Practical code examples for using ContextPilot.
| Example | Description | Prerequisites |
|---|---|---|
simple_reorder_example.py |
Minimal "hello world" — call /reorder with 4 contexts |
ContextPilot server (stateless) |
pipeline_examples.py |
Pipeline API usage (BM25, FAISS, generation) | Corpus file |
http_server_example.py |
Stateful index server (build, proxy, eviction) | ContextPilot + inference engine |
backend_e2e_check.py |
Backend eviction sync (SGLang / vLLM) | ContextPilot + inference engine |
stateless_batch_example.py |
Stateless batch reordering (3 approaches) | ContextPilot server (stateless) |
stateless_sglang_e2e.py |
Stateless reordering → inference e2e | ContextPilot + inference engine |
pageindex_e2e_example.py |
PageIndex tree → ContextPilot scheduling with prefix sharing | ContextPilot (demo: none; full: PageIndex + OpenAI) |
mem0_bench_simple.py |
A/B benchmark: Baseline vs ContextPilot (cache hit rate) | mem0ai + OpenAI key + inference engine |
mem0_locomo_example.py |
LoCoMo multi-turn benchmark with TTFT & F1 scoring | mem0ai + OpenAI key + inference engine |
# Terminal 1: Start server in stateless mode
python -m contextpilot.server.http_server --port 8765 --stateless
# Terminal 2: Run
python examples/simple_reorder_example.pypython examples/pipeline_examples.py# Terminal 1: Start inference engine with ContextPilot eviction callback
# SGLang:
CONTEXTPILOT_INDEX_URL=http://localhost:8765 python -m sglang.launch_server \
--model-path Qwen/Qwen3-4B --port 30000 --schedule-policy lpm
# or vLLM:
CONTEXTPILOT_INDEX_URL=http://localhost:8765 python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3-4B --port 30000 --enable-prefix-caching
# Terminal 2: Start ContextPilot server
python -m contextpilot.server.http_server --port 8765 --infer-api-url http://localhost:30000
# Terminal 3: Run example
python examples/http_server_example.py# Terminal 1: Start inference engine
python -m sglang.launch_server --model-path Qwen/Qwen3-4B --port 30000
# or: python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3-4B --port 30000 --enable-prefix-caching
# Terminal 2: Start ContextPilot (stateless)
python -m contextpilot.server.http_server --port 8765 --stateless --infer-api-url http://localhost:30000
# Terminal 3: Run
python examples/stateless_sglang_e2e.py# Terminal 1: Start ContextPilot
python -m contextpilot.server.http_server --port 8765
# Terminal 2: Start backend
# SGLang:
CONTEXTPILOT_INDEX_URL=http://localhost:8765 python -m sglang.launch_server \
--model-path Qwen/Qwen2.5-7B-Instruct --port 30000
# vLLM:
CONTEXTPILOT_INDEX_URL=http://localhost:8765 python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-7B-Instruct --port 8000 --enable-prefix-caching
# NOTE: flash-attn / flashinfer installation is left to the user.
# Terminal 3: Run (route through proxy for full rid tracking)
python examples/backend_e2e_check.py --backend-url http://localhost:8765 --cp-url http://localhost:8765# No API key needed — uses bundled Disney earnings tree (41 nodes)
python examples/pageindex_e2e_example.pyTo generate a tree from your own PDF, use PageIndex:
pip install pageindex
# See https://github.com/yinsicheng/PageIndex for usageFor the full LLM pipeline (tree search + answer generation via PageIndexRetriever):
pip install openai
export OPENAI_API_KEY="your-key"
python examples/pageindex_e2e_example.py --tree path/to/my_tree.json -q "What was DTC revenue?"pip install mem0ai
export OPENAI_API_KEY="your-key"
# Start inference engine + ContextPilot (live mode), then:
python examples/mem0_bench_simple.pyexamples/
├── simple_reorder_example.py # Minimal hello world
├── pipeline_examples.py # Pipeline API (BM25, FAISS, generation)
├── http_server_example.py # Stateful index server
├── backend_e2e_check.py # Backend eviction sync e2e (SGLang / vLLM)
├── stateless_batch_example.py # Stateless batch reordering
├── stateless_sglang_e2e.py # Stateless → inference e2e
├── pageindex_e2e_example.py # PageIndex + ContextPilot
├── mem0_bench_simple.py # A/B cache hit benchmark
├── mem0_locomo_example.py # LoCoMo multi-turn benchmark
├── offline/ # Batch processing utilities
│ └── prepare_batch.py # Batch optimization
├── construct_rag_data/ # Data retrieval utilities
│ ├── multihopRAG_bm25.py # BM25 retrieval
│ └── multihopRAG_faiss.py # FAISS retrieval
└── data/ # Sample data
└── disney_q1_fy25_tree.json # 41-node PageIndex tree (Disney Q1 FY25)
Queries (JSONL):
{"qid": 0, "text": "What is machine learning?"}Corpus (JSONL):
{"doc_id": 0, "text": "Machine learning is a subset of AI..."}