Skip to content

benoitpetit/mira

Repository files navigation

MIRA Logo

MIRA

Memory with Information-theoretic Relevance Allocation

Long-term Memory System for LLMs with Optimal Context Budget Allocation

Go Version License Version Tests

100% Local • Deterministic (embedding variance < 1e-6) • Clean Architecture

API ReferenceChangelogSkillFrancaisSOUL Extension


Table of Contents


What is MIRA?

MIRA is a sophisticated long-term memory system designed specifically for Large Language Models (LLMs). Unlike traditional memory systems that simply store and retrieve, MIRA uses information-theoretic allocation to optimize every token in the context window.

SOUL Extension: Identity Preservation

MIRA answers "What does the agent know?" But a complete agent needs more: it needs to know "Who is it?"

SOUL (System for Observed Unique Legacy) is an optional identity extension for MIRA that captures, stores, and recalls the personality, voice, and values of AI agents across sessions and model changes.

To embed SOUL in MIRA, start with --with-soul or set soul.enabled: true in config. When enabled, SOUL provides 8 additional MCP tools (16 total) for:

  • Capturing identity from conversations
  • Recalling identity prompts for LLM context injection
  • Detecting identity drift after model changes
  • Generating reinforcement prompts after model swaps

SOUL is opt-in and disabled by default. MIRA works perfectly alone (8 tools). To activate SOUL, use --with-soul flag or set soul.enabled: true in config.yaml. When enabled, they share the same SQLite database.

Configuration Tools What it answers
MIRA only 8 mira_* "What does the agent know?"
MIRA + SOUL 16 mira_* + soul_* "What does the agent know?" + "Who is the agent?"

The Problem MIRA Solves

Modern LLMs suffer from a fundamental problem: the context window is limited (4K-128K tokens), but conversations and projects span thousands of interactions. How do we decide what to keep in context?

Traditional approaches fail:

  • Simple RAG: Retrieval based only on similarity, ignores information density
  • Sliding window: Loses critical information from the beginning
  • Static summarization: Doesn't adapt to the current query
  • Basic Vector DB: O(n) complexity, no budget management

MIRA provides the solution:

  • [+] Context Budget Allocation: Optimizes every token across 6 dimensions
  • [+] Information Density: Prioritizes memory-rich facts
  • [+] Temporal Coherence: Maintains narrative continuity
  • [+] Causal Graph: Understands cause-effect relationships
  • [+] O(log n) Search: HNSW for millions of memories
  • [+] Clean Architecture: Maintainable, testable, extensible

The Memory Revolution for LLMs

What MIRA Brings New

1. Information Allocation (CBA)

Instead of simply retrieving the "most similar", MIRA solves an optimization problem under constraint: maximize useful information within a fixed token budget.

Score(m) = Relevance × Density × Recency × (1-Overlap) × Coherence × CausalPenalty

2. Triple Representation (T0/T1/T2)

Each memory exists in 3 forms for different uses:

  • T0 (Verbatim): Full original text
  • T1 (Fingerprint): Structured extracted facts (~15% of tokens)
  • T2 (Embedding): 384D semantic vector for search

3. Integrated Causal Graph

Automatic detection of relations (BECAUSE, TRIGGERED, CONTRADICTS, UPDATES, RESOLVES) to trace reasoning chains.

4. Adaptive Rendering

Based on remaining budget, MIRA intelligently chooses the detail level:

  • Header (5 tokens): Reference only
  • Fingerprint (~15% tokens): Essential facts
  • Verbatim (100% tokens): Full text

How It Works

Overview Flow

┌─────────────────────────────────────────────────────────────────────────┐
│                         MEMORY STORAGE                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Input Text         T1,T2 Extraction        Atomic Storage             │
│   ┌─────────┐       ┌──────────────┐          ┌─────────────────┐       │
│   │"We      │──────→│  Fingerprint │─────────→│  SQLite + HNSW  │       │
│   │ decided │       │  + Embedding │          │  (WAL Mode)     │       │
│   │ to use  │       └──────────────┘          └─────────────────┘       │
│   │PostgreSQL"            │                       │                     │
│   └─────────┘             ↓                       ↓                     │
│                        T1: {                 Vector Index               │
│                          - decision: "PostgreSQL"  ℝ³⁸⁴                 │
│                          - rejected: ["MySQL",     HNSW O(log n)        │
│                                      "MongoDB"]                         │
│                          - reason: ["ACID", "Exp"]                      │
│                          - type: DECISION                               │
│                                                                         │
│                         T2: [0.23, -0.15, 0.89, ...] 384D               │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────┐
│                         RETRIEVAL (RECALL)                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Query "Why PostgreSQL?"                                               │
│       │                                                                 │
│       ▼                                                                 │
│   ┌─────────────┐    ┌─────────────────┐    ┌──────────────────────┐    │
│   │ Embedding   │───→│  HNSW Search    │───→│  Composite Scoring   │    │
│   │ Query       │    │  Top 100        │    │  CBA Algorithm       │    │
│   │ ℝ³⁸⁴        │    │  O(log n)       │    │  O(n log n)          │    │
│   └─────────────┘    └─────────────────┘    └──────────────────────┘    │
│                                                        │                │
│                                                        ▼                │
│                                              Greedy Selection           │
│                                              with 4000 token budget     │
│                                                        │                │
│       ┌────────────────────────────────────────────────┘                │
│       ▼                                                                 │
│   Optimized Result:                                                     │
│   ┌───────────────────────────────────────────────────────────────┐     │
│   │ [1] Fingerprint: "PostgreSQL Decision (ACID, expertise)" 45tk │     │
│   │ [2] Verbatim: "Meeting 04/15 - DB discussion..."         120tk│     │
│   │ [3] Header: "Sprint 5 deadline"                           5tk │     │
│   │ ...                                                           │     │
│   │ Total: 3987/4000 tokens (99.7% utilization)                   │     │
│   └───────────────────────────────────────────────────────────────┘     │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The CBA Composite Score

For each candidate memory, MIRA calculates a multidimensional score:

┌─────────────────────────────────────────────────────────────────────┐
│                     CBA SCORE FORMULA                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   S(m) = ρ × δ × η × (1-σ) × τ × χ × 𝟙[ρ>θ]                         │
│                                                                     │
│   where:                                                            │
│   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━    │
│   ρ (rho)    = Semantic Relevance      cos(embedding_m, embedding_q)│
│   δ (delta)  = Information Density     sigmoid(facts/√tokens)       │
│   η (eta)    = Temporal Weight         exp(-λ × age)                │
│   σ (sigma)  = Max Overlap             sim(m, already_selected)     │
│   τ (tau)    = Session Boost           +20% if same session         │
│   χ (chi)    = Causal Penalty          avoids long chains           │
│   𝟙[ρ>θ]     = Relevance Threshold     eliminates if ρ < 0.6        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

3-Level Architecture (T0/T1/T2)

Why 3 Levels?

The human brain doesn't record everything with the same fidelity. MIRA mimics this hierarchy:

┌─────────────────────────────────────────────────────────────────────┐
│                        T0/T1/T2 HIERARCHY                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   LEVEL T0 - VERBATIM (Episodic Memory)                             │
│   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│   "Meeting April 15, 2024 at 14:30.                                 │
│    Participants: Marie (Tech Lead), Jean (DevOps), Sophie (PO)      │
│    Marie: 'I propose we migrate to PostgreSQL for v2'               │
│    Jean: 'It requires training, but it's more robust'               │
│    Sophie: 'Client approves for Sprint 5'                           │
│    Final decision: PostgreSQL migration approved"                   │
│                                                                     │
│    • Storage: Full UTF-8 text (max 64KB)                            │
│    • Usage: Rich context when budget allows                         │
│    • Cost: ~200 tokens                                              │
│                                                                     │
│                              ↓ NLP Extraction                       │
│                                                                     │
│   LEVEL T1 - FINGERPRINT (Semantic Memory)                          │
│   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│   {                                                                 │
│     "type": "decision",                                             │
│     "decision": "PostgreSQL Migration",                             │
│     "rejected": ["MySQL", "MongoDB"],                               │
│     "reason": ["ACID Robustness", "Client validation"],             │
│     "assignee": "Jean",                                             │
│     "deadline": "Sprint 5",                                         │
│     "validated_by": "Sophie (PO)"                                   │
│   }                                                                 │
│                                                                     │
│    • Storage: Structured canonical JSON                             │
│    • Usage: Dense context when budget is medium                     │
│    • Cost: ~30 tokens (15% of T0)                                   │
│                                                                     │
│                              ↓ Embedding                            │
│                                                                     │
│   LEVEL T2 - EMBEDDING (Procedural Memory)                          │
│   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│   [0.23, -0.15, 0.89, -0.42, 0.67, ...]  // 384 dimensions          │
│                                                                     │
│    • Storage: float32[384] vector                                   │
│    • Usage: O(log n) vector search                                  │
│    • Cost: 0 tokens (search only)                                   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Memory Types and Decay

Type λ (day⁻¹) Half-life Auto-Archive Usage
decision 0.001 ~693 days No Architectural decisions
fact 0.005 ~139 days No Knowledge, facts
preference 0.01 ~69 days No User preferences
session_note 0.1 ~7 days 30 days Session notes
debug_log 0.5 ~1.4 days 7 days Debug logs

The CBA Algorithm

Context Budget Allocator v2

┌─────────────────────────────────────────────────────────────────────┐
│                    CBA ALGORITHM - O(n²)                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  INPUT:  Query q, Budget B (tokens), Wing w, Room r                 │
│  OUTPUT: List of memories with render mode                          │
│                                                                     │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│                                                                     │
│  1. EMBEDDING                                                       │
│     e_q ← Embed(q) with LRU cache (1000 entries)                    │
│                                                                     │
│  2. VECTOR SEARCH                                                   │
│     C ← HNSW_Search(e_q, N=100, w, r)        # O(log n)             │
│     If HNSW not ready: C ← SQLite_Search(e_q, N=100)  # Fallback    │
│                                                                     │
│  3. EARLY PRUNING                                                   │
│     C' ← {c ∈ C : ρ(c,q) > 0.6}                                     │
│     If C' = ∅: C' ← top-5(C) by ρ                                  │
│                                                                     │
│  4. INITIAL SCORING                                                 │
│     For each c ∈ C':                                                │
│        c.score ← ρ(c) × δ_sigmoid(c) × η_recency(c)                 │
│                                                                     │
│  5. GREEDY SELECTION WITH DYNAMIC RENORMALIZATION                   │
│     S ← ∅, tokens_used ← 0                                         │
│     PQ ← MaxHeap(C')  # by initial score                            │
│                                                                     │
│     WHILE PQ ≠ ∅ AND tokens_used < B:                              │
│        c ← Pop(PQ)                                                  │
│                                                                     │
│        # Dynamic recalculation (depends on already selected S)      │
│        c.σ_max ← max_{s∈S} similarity(c,s)                          │
│        c.χ ← exp(-0.15 × |causal_links(c,S)|)                       │
│        c.τ ← 1.2 if |time(c) - time(S)| < 2h else 1.0               │
│                                                                     │
│        adjusted_score ← c.score × (1-c.σ_max) × c.χ × c.τ           │
│                                                                     │
│        # Check if next has better score                             │
│        If PQ[0].score × 0.8 > adjusted_score:                       │
│           Push(PQ, c) with adjusted_score                           │
│           continue                                                  │
│                                                                     │
│        # Determine mode based on REMAINING BUDGET                   │
│        remaining ← B - tokens_used                                  │
│        mode ← ChooseMode(c, remaining)                              │
│        cost ← CalculateCost(c, mode)                                │
│                                                                     │
│        # Downgrade if necessary                                     │
│        If tokens_used + cost > B:                                   │
│           mode ← Downgrade(mode)  # Verbatim → Fingerprint → Header │
│           cost ← Recalculate(mode)                                  │
│           If tokens_used + cost > B: continue                       │
│                                                                     │
│        S ← S ∪ {c}, tokens_used ← tokens_used + cost                │
│                                                                     │
│  6. RETURN S sorted by descending score                             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Adaptive Render Modes

Remaining Budget Mode Tokens Content
< 100 Header 2-5 [type|date|wing]
< 1000 Fingerprint ~15% Essential facts T1
≥ 1000 Verbatim 100% Original text T0

Enhanced Recall Pipeline

MIRA's recall uses a multi-stage retrieval pipeline that goes far beyond simple vector similarity:

Query → Expansion → Dense (HNSW) + Lexical (FTS5) → RRF Fusion → Clustering → Tag Boost → Adaptive Threshold → CBA Greedy Selection

1. Query Expansion

Before embedding, MIRA generates semantically-close variants of the query (cleaned, without stop-words, top keywords) and averages their embeddings. This improves cross-lingual retrieval and robustness against vocabulary mismatch.

2. Hybrid Search (Dense + Lexical)

  • Dense: HNSW O(log n) vector search
  • Lexical: SQLite FTS5 full-text search (auto-enabled if available)
  • Fusion: Reciprocal Rank Fusion (k=60) merges both rankings into a single candidate list

3. Search-Time Clustering

Candidates are grouped by cosine similarity ≥ 0.88. Near-duplicates are collapsed, and only the best representative per cluster proceeds to scoring. This prevents budget waste on redundant memories.

4. Tag-Based Retrieval

A new memory_tags table indexes extracted entities, subjects, and keywords. Candidates matching query tags receive a small additive relevance boost.

5. Adaptive Threshold Methods

Instead of a fixed 0.6 relevance floor, MIRA now supports three dynamic methods:

Method Description Default
iqr First quartile of score distribution Yes
elbow Largest derivative drop (elbow method)
mean_stddev mean - stddev

The threshold is clamped between 0.15 (floor) and 0.75 (ceiling).

6. Heuristic Reranker (Optional)

A lightweight pure-Go reranker scores top-k candidates using:

  • Jaccard-like token overlap
  • Exact phrase presence bonus
  • Length balance preference

Blended with semantic relevance: 0.7*semantic + 0.3*rerank.

7. Fallback Vector Store

If HNSW is not yet ready (e.g., building from scratch), a transparent fallback wrapper automatically routes searches to the SQLite vector store. Recall never fails.


Causal Graph

Supported Relations

┌─────────────────────────────────────────────────────────────────────┐
│                      CAUSAL RELATIONS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   BECAUSE                    A ←────────── B                        │
│   "B explains why A"         Bug understood  Because we analyzed    │
│                              ───────────→   the logs                │
│                                                                     │
│   TRIGGERED                  A ←────────── B                        │
│   "B triggered A"            Migration    After the decision        │
│                              ───────────→  meeting                  │
│                                                                     │
│   CONTRADICTS                A ←────────→ B                         │
│   "A and B contradict"       Option A     Option B                  │
│                              ───────────→  incompatible             │
│                                                                     │
│   UPDATES                    A ←────────── B                        │
│   "B replaces/updates A"     Spec v1      Spec v2                   │
│                              ───────────→  (new version)            │
│                                                                     │
│   RESOLVES                   A ←────────── B                        │
│   "B resolves problem A"     Bug #123     Fix #124                  │
│                              ───────────→  (correction)             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Automatic Detection

Relations are automatically detected via linguistic patterns:

causalPatterns := map[RelationType]*regexp.Regexp{
    RelTriggered:   regexp.MustCompile(`(?i)(?:following|after|in response to)`),
    RelBecause:     regexp.MustCompile(`(?i)(?:because|since|due to|in reason of)`),
    RelContradicts: regexp.MustCompile(`(?i)(?:contradicts|in contradiction|however)`),
    RelUpdates:     regexp.MustCompile(`(?i)(?:updates|replaces)`),
    RelResolves:    regexp.MustCompile(`(?i)(?:resolves|solves|fixes)`),
}

Installation

Prerequisites

  • Go 1.23+ (if building from source)
  • SQLite3 (included)
  • ~100MB disk space for embedding model

From Sources

# Clone the repository
git clone https://github.com/benoitpetit/mira.git
cd mira

# Build
go build -o mira ./cmd/mira

# Verify
./mira --version

Via Go Install

go install github.com/benoitpetit/mira/cmd/mira@latest

Binary Releases

Download pre-compiled binaries from the Releases page:

# Linux/macOS
tar -xzf mira-linux-amd64.tar.gz
sudo mv mira /usr/local/bin/
mira --version

# Windows
unzip mira-windows-amd64.zip
.\mira.exe --version

Quick Start

1. Initialization

# Copy example configuration
cp config.example.yaml config.yaml

# Edit to your needs
nano config.yaml

2. Start the MCP Server

# stdio mode (for Claude Desktop, Cursor, etc.)
./mira

# With custom config file
./mira -config ./config.yaml

# Run database migrations only
./mira -migrate

3. Use MCP Tools

Store a Memory

{
  "tool": "mira_store",
  "arguments": {
    "content": "We decided to migrate to PostgreSQL for v2. Rejected: MySQL (not ACID), MongoDB (not relational). Reason: ACID and team expertise. Approved by CTO. Assigned to Jean.",
    "wing": "backend-team",
    "room": "database-migration"
  }
}

Retrieve Context

{
  "tool": "mira_recall",
  "arguments": {
    "query": "Why did we choose PostgreSQL?",
    "budget": 2000,
    "wing": "backend-team"
  }
}

Response:

=== MIRA CONTEXT ===
Query: Why did we choose PostgreSQL? | Budget: 2000
Wing: backend-team

--- [1] FINGERPRINT (45 tokens) ---
Decision: PostgreSQL Migration
Rejected: MySQL, MongoDB
Reason: ACID, team expertise
Approved by: CTO
Assigned: Jean

--- [2] VERBATIM (120 tokens) ---
We decided to migrate to PostgreSQL for v2...
[full content]

=== Total: 165/2000 tokens (8.3%) ===

Causal Chain

{
  "tool": "mira_causal_chain",
  "arguments": {
    "id": "uuid-of-the-decision",
    "max_depth": 3,
    "include_consequences": true
  }
}

Configuration

config.yaml File

system:
  version: "0.4.7"

storage:
  path: ".mira"
  sqlite:
    journal_mode: WAL
    synchronous: NORMAL
    cache_size: -64000
    mmap_size: 268435456
    temp_store: MEMORY

embeddings:
  current_model: "sentence-transformers/all-MiniLM-L6-v2"
  model_hash: "a2d8f3e9"
  dimension: 384
  batch_size: 32
  cache_size: 1000

# HNSW Vector Index Configuration
hnsw:
  M: 32 # Max neighbors per node (tuned for better recall, see v0.4.2)
  Ml: 0.25 # Level generation factor
  ef_construction: 0 # Inactive — not supported by underlying hnsw library
  ef_search: 100 # Dynamic candidate list for search (tuned, see v0.4.2)

allocator:
  default_budget: 4000
  max_candidates: 100
  early_pruning_threshold: 0.6
  session_window_seconds: 7200
  session_boost_beta: 0.2
  session_boost_max: 1.2
  causal_penalty_alpha: 0.15
  density_sigmoid:
    k: 2.0
    mu: 0.3

decay_rates:
  decision: 0.001
  fact: 0.005
  preference: 0.01
  session_note: 0.1
  debug_log: 0.5

archive_thresholds:
  session_note: 30
  debug_log: 7

overlap_cache:
  ttl_days: 30
  max_entries: 1000000

extraction:
  min_entity_length: 2
  causal_lookback: 50
  causal_max_days: 30

# Enhanced recall configuration
recall:
  adaptive_threshold_method: "iqr"
  adaptive_threshold_floor: 0.15
  adaptive_threshold_ceiling: 0.75
  enable_fts5: true
  fts5_limit: 100
  rrf_k: 60
  query_expansion:
    enabled: true
    num_variants: 3
    temperature: 0.3
  search_time_clustering:
    enabled: true
    similarity_threshold: 0.88
  reranker:
    enabled: false
    top_k: 30

# SOUL identity extension (disabled by default)
soul:
  enabled: false

mcp:
  name: "mira"
  version: "0.4.7"
  transport: "stdio"  # stdio is the only supported transport
  timeout_seconds: 30

# Prometheus metrics export
metrics:
  enabled: true
  prometheus_addr: ":9090"
  report_interval_seconds: 60

# Webhook notifications
webhooks:
  enabled: false
  workers: 3
  queue_size: 1000
  timeout_seconds: 30
  endpoints: []

MCP API

Available Tools

Tool Description
mira_store Store memory with T0→T1,T2 extraction
mira_recall Retrieve optimal context with budget
mira_load Load full verbatim by ID
mira_causal_chain Trace causal chain
mira_status System statistics and health
mira_timeline Filtered chronological reconstruction
mira_archive Archive and clean old memories
mira_clear_memory Permanently delete all or room-scoped memories

Fallback Wings

When recalling from a specific wing yields no results, mira_recall supports comma-separated fallback wings:

{
  "tool": "mira_recall",
  "arguments": {
    "query": "database migration strategy",
    "budget": 2000,
    "wing": "backend-team",
    "fallback_wings": "platform-team,dba-team"
  }
}

If the primary wing has no matching memories, MIRA will automatically search the fallback wings in order.

Multilingual & Broad Search

mira_recall supports queries in any language (English, French, Spanish, Italian, German, etc.) thanks to cross-lingual embeddings. If the initial semantic search yields sparse results — for example, when a query in one language searches against memories stored in another — MIRA automatically broadens the search with relaxed thresholds and merges the results. You do not need to translate queries or adjust parameters.

{
  "tool": "mira_recall",
  "arguments": {
    "query": "règles de langue français anglais",
    "budget": 2000,
    "wing": "general"
  }
}

See API_REFERENCES.md for detailed API reference and usage examples.

Health Check Endpoints

When metrics are enabled, MIRA exposes health endpoints:

# Full health check (includes DB, Vector Store, Embedder)
curl http://localhost:9090/health

# Liveness probe (Kubernetes)
curl http://localhost:9090/health/live

# Readiness probe (Kubernetes)
curl http://localhost:9090/health/ready

# Prometheus metrics
curl http://localhost:9090/metrics

Performance

Algorithmic Complexities

Operation Complexity Notes
Store T0,T1,T2 O(1) Atomic insertion
Vector Search O(log n) HNSW ANN
CBA Scoring O(n) n = candidates
Allocation O(n²) Greedy selection
Causal Graph BFS O(V+E) V=nodes, E=edges

Real-World Performance

Metric Value
HNSW Search ~0.14 ms for 10K vectors (benchmarked, O(log n))
SQLite Search ~50 ms for 10K vectors (estimated)
Full Allocation ~35 ms for 100 candidates (estimated)
Cosine Similarity ~3.3M ops/sec

Optimizations in v0.3.3

  • Query Expansion: Multi-variant embedding averaging for robust cross-lingual retrieval
  • FTS5 Lexical Search: SQLite full-text search integrated with auto-triggers and backfill
  • RRF Hybrid Fusion: Reciprocal Rank Fusion (k=60) combining dense HNSW and lexical FTS5 results
  • Search-Time Clustering: Real-time deduplication with cosine-similarity clustering (threshold 0.88)
  • Tag-Based Retrieval: memory_tags table with automatic tag boosting in CBA scoring
  • Heuristic Reranker: Optional lightweight lexical reranker for precision improvement
  • Adaptive Threshold Methods: Dynamic relevance pruning with iqr, elbow, and mean_stddev strategies
  • Fallback Vector Store: Transparent HNSW → SQLite fallback when the index is not ready
  • Clear Memory Tool: New mira_clear_memory MCP tool for global or room-scoped memory deletion
  • Causal Chain T0 Resolution: mira_causal_chain now correctly resolves T0: verbatim references to fingerprint IDs
  • ID Visibility in Outputs: mira_recall and mira_timeline now include memory IDs for downstream tool chaining
  • LLM-Self-Correction Errors: Invalid ID errors are now actionable, telling LLMs exactly where to get valid IDs

Optimizations in v0.3.1

  • Lazy Evaluation: Overlap calculation only for promising candidates
  • LRU Cache: 1000 entries for query embeddings
  • HNSW Persistence: Fast index reload on restart
  • SQLite WAL Mode: Concurrent read/write performance
  • Adaptive Threshold: Lowered relevance threshold for small corpora (<10 memories)
  • Default Room Mapping: Auto-assigns standard rooms based on memory type

Technical Architecture

Clean Architecture (Uncle Bob)

┌─────────────────────────────────────────────────────────────────────┐
│                    CLEAN ARCHITECTURE                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  DOMAIN (Enterprise Rules)                                  │   │
│   │  • entities: Verbatim, Fingerprint, Embedding, Candidate    │   │
│   │  • valueobjects: MemoryType, RenderMode, RelationType       │   │
│   │  ✓ No external dependencies                                 │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                              ▲                                      │
│                              │ Dependency                           │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  USE CASES (Application Rules)                              │   │
│   │  • StoreMemory, RecallMemory (CBA), LoadMemory              │   │
│   │  • GetTimeline, GetStatus, GetCausalChain, Archive          │   │
│   │  • ports: Repository interfaces                             │   │
│   │  ✓ Depends only on Domain                                   │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                              ▲                                      │
│                              │                                      │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  INTERFACE ADAPTERS                                         │   │
│   │  • storage: SQLiteRepository                                │   │
│   │  • vector: HNSWStore, SQLiteVectorStore                     │   │
│   │  • extraction: NativeExtractor, CybertronEmbedder           │   │
│   │  • webhook, metrics                                         │   │
│   │  ✓ Implements ports                                         │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                              ▲                                      │
│                              │                                      │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  FRAMEWORKS & DRIVERS                                       │   │
│   │  • SQLite3, HNSW lib, Cybertron, MCP Server                 │   │
│   │  ✓ External technical details                               │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Project Structure

mira/
├── cmd/mira/              # Entry point
├── internal/
│   ├── domain/            # Domain Layer
│   │   ├── entities/      # Business entities
│   │   └── valueobjects/  # Value objects
│   ├── usecases/          # Use Cases Layer
│   │   ├── ports/         # Interfaces (Repository, Services)
│   │   └── interactors/   # Use case implementations
│   ├── adapters/          # Adapters Layer
│   │   ├── storage/       # SQLite repository
│   │   ├── vector/        # HNSW, SQLite vector store, overlap cache
│   │   ├── extraction/    # NLP, embeddings
│   │   ├── logging/       # Structured logging
│   │   ├── webhook/       # HTTP notifications
│   │   └── metrics/       # Prometheus metrics
│   ├── interfaces/        # Interfaces Layer
│   │   └── mcp/           # MCP controller
│   ├── config/            # Configuration
│   └── app/               # Composition root (DI)
│       ├── main.go        # Dependency injection
│       ├── health.go      # Health checks
│       ├── health_test.go # Health check tests
│       └── main_test.go   # Application tests
├── docs/                  # Documentation
│   ├── INDEX.md           # Documentation entry point
│   ├── ARCHITECTURE.md    # Technical deep-dive
│   ├── FEATURES.md        # Complete feature catalog
│   └── API_REFERENCES.md  # API reference
├── SKILL.md               # Agent skill and memory loop guidelines
├── config.example.yaml    # Example configuration
└── README.md              # This file

Development

Testing

# Unit tests
go test -v ./...

# With race detector
go test -race ./...

# Benchmarks
go test -bench=. -benchmem ./...

# Coverage
go test -cover ./...

Make Commands

make build       # Build
make test        # Tests (with race detector)
make test-short  # Quick tests
make bench       # Benchmarks
make bench-full  # Full benchmarks
make run         # Build and run with config.yaml
make clean       # Clean build artifacts and data
make lint        # Run linters
make fmt         # Format code
make install     # Install to GOPATH/bin
make prepublish VERSION=x.y.z  # Prepare a release

Changelog

v0.4.7 (2026-04-24)

  • 🚀 New version 0.4.7

v0.4.6 (2026-04-24)

  • 🚀 New version 0.4.6

v0.4.5 (2026-04-24)

  • Unified SOUL configuration: Configure all SOUL settings directly inside MIRA's config.yaml — no separate process or config file needed for embedded mode. Supports drift threshold, recall budget, extraction confidence, model-swap reinforcement, and evolution history tuning.
  • Uses new soul.NewApplicationWithDBAndConfig API for seamless config passthrough.

v0.4.4 (2026-04-23)

  • SOUL opt-in integration: MIRA now runs standalone (8 tools) by default. SOUL identity extension must be explicitly enabled via --with-soul CLI flag or soul.enabled: true in config.
  • SOUL init failures are non-fatal — MIRA gracefully falls back to 8-tool mode.

v0.4.3 (2026-04-23)

  • Fixed SOUL MCP parameter names: agentagent_id, modelmodel_id, fromfrom_model, toto_model.

v0.4.2 (2026-04-17)

  • HNSW tuned defaults: M 16 → 32, ef_search 50 → 100 for better recall.
  • Concurrent embedding pool: Replaced global mutex with model instance pool.
  • Parallel recall pipeline: Dense HNSW + lexical FTS5 now execute concurrently.
  • ef_construction documented as inactive (not supported by underlying library).

See CHANGELOG.md for the full release history.


References

Key Libraries

  • tiktoken-go - OpenAI tokenization
  • Native Go implementation - Rule-based NLP/NER (replaces archived prose)
  • cybertron - Transformer embeddings
  • hnsw - HNSW graphs
  • mcp-go - MCP protocol

Embedding Model

  • Model: sentence-transformers/all-MiniLM-L6-v2
  • Dimensions: 384
  • Size: ~80MB
  • Performance: ~1000 texts/sec on CPU

MIRA - Memory with Information-theoretic Relevance Allocation

"Memory is the sap of artificial intelligence."

API ReferenceChangelog

About

Long-term memory system for LLMs with optimal context budget allocation, approximation guarantees, and temporal coherence. 100% local, deterministic, O(n log n).

Topics

Resources

License

Stars

Watchers

Forks

Contributors