Skip to content

[lore 6/7] Ingester + heuristic classifier (Path B)#5

Merged
kunallanjewar merged 1 commit into
mainfrom
quest-6-ingest-heuristic
Apr 28, 2026
Merged

[lore 6/7] Ingester + heuristic classifier (Path B)#5
kunallanjewar merged 1 commit into
mainfrom
quest-6-ingest-heuristic

Conversation

@kunallanjewar
Copy link
Copy Markdown
Contributor

Summary

  • Adds pkg/lore/ingest: Ingester interface, Result/FileError types, WalkDir filesystem walker, ChunkFile markdown chunker (H1/H2/H3 boundaries with YAML front matter)
  • Adds pkg/lore/ingest/heuristic: NewIngester with four-level classification (front matter > path rules > heading patterns > fallback=research), DefaultRules table covering ADR/runbook/concept/reference/agent-config patterns, functional options WithRules/WithLogger/WithTracer/WithMaxFileSize
  • OTel spans: lore.ingest.process (path.count, entries.count attrs) and lore.ingest.classify (per-file)
  • Testdata corpus (5 fixture files) and 9 tests covering all classification paths, walker skip behavior, chunker boundaries, and per-file error handling without aborting the walk
  • README updated with Ingester quick start, classification priority, customization example, and walker behavior notes

Design notes

The Ingester is a pure functional transform: Process returns []lore.Entry and the caller writes to Store + VectorStore. This keeps the ingester composable with any transactional or batching strategy.

Classification priority (first match wins):

  1. YAML front matter kind: field
  2. Path rules (filepath.Match globs, tested against both repo-relative path and basename)
  3. Heading keywords in H1/H2/H3 text
  4. Fallback: kind=research

Test plan

  • go test -race ./pkg/lore/ingest/... passes
  • go vet ./... clean
  • gofmt -l ./pkg/lore/ingest/ produces no output
  • PII grep zero matches

@kunallanjewar kunallanjewar marked this pull request as ready for review April 28, 2026 15:31
@kunallanjewar kunallanjewar force-pushed the quest-6-ingest-heuristic branch 2 times, most recently from 2457b71 to 5f385f8 Compare April 28, 2026 15:36
…unker (Path B)

Adds pkg/lore/ingest: Ingester interface, WalkDir filesystem walker, ChunkFile
markdown chunker (H1/H2/H3 boundaries + YAML frontmatter), and
pkg/lore/ingest/heuristic: NewIngester with four-level classification
(frontmatter > path rules > heading patterns > fallback=research),
DefaultRules covering ADR/runbook/concept/reference/agent-config paths,
functional options WithRules/WithLogger/WithTracer/WithMaxFileSize, OTel spans
(lore.ingest.process + lore.ingest.classify), slog structured logging,
testdata fixture corpus, and tests covering all classification paths.
@kunallanjewar kunallanjewar force-pushed the quest-6-ingest-heuristic branch from 5f385f8 to 5753741 Compare April 28, 2026 15:38
@kunallanjewar kunallanjewar merged commit 16ad27a into main Apr 28, 2026
4 checks passed
@kunallanjewar kunallanjewar deleted the quest-6-ingest-heuristic branch April 28, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant