Tasks are grouped by component/feature area, with priorities P0-P3 (P0=higest, P3=lowest). Note that priorities were assigned by Copilot, and I don't always agree (it often assigns P0 or P1 to complicated work items).
Each item also has a rough size estimate.
Gradually move work items from here to repo Issues.
- [P1, medium] Scrutinize sqlite/reltermsindex.py
- [P1, medium] Review the new storage code more carefully, adding notes
- [P1, large] Unify tests for storage APIs
- [P1, medium] Make (de)serialize methods async in interfaces.py if they might execute SQL statements
- [P1, medium] Implement SqliteRelatedTermsAliases.serialize() and other missing serialize()/deserialize() methods
- [P2, large] Fix all bugs related to ordinals/ids relying on starting at 0 and no gaps (prepare for deletions)
- [P2, large] Refactor memory and sqlite indexes to share more code (e.g. population and query logic)
- [P2, medium] Make the collection/index accessors in StorageProvider synchronous (the async work is all done in create())
- [P2, medium] Replace the storage accessors with readonly @property functions
- [P2, medium] "Ordinals" should be renamed to "Id" (tedious though)
- [P3, medium] Flatten secondary indexes into Conversation (they are no longer optional), reducing the structure to:
- Message collection
- SemanticRef collection
- SemanticRef index
- Property to SemanticRef index
- Timestamp to TextRange
- Terms to related terms
- [P3, medium] Split related terms index in two (aliases and fuzzy_index)
- [P3, large] Implement consistent approach to deletions (tombstoning in sqlite, cascade delete semrefs and indexes)
- [P1, medium] Look more into why the search query schema is so unstable
- [P1, large] Redesign the whole pipeline; make each stage its own function with simpler API
- [P1, large] Improve test coverage for search, searchlang, query, sqlite
- [P2, medium] Implement token budgets for answer generation (may leave out messages, favoring only knowledge)
- [P2, medium] Change answer context to be text (message texts and timestamps), not JSON or semantic ref ordinals
- [P2, medium] Split large answer contexts to avoid overflowing the answer generator's context buffer
- [P1, medium] Move TypedDicts out of interfaces.py (they don't belong there)
- [P1, medium] Fix need for
# type: ignorecomments (22 in typeagent/) by making I-interfaces more generic
- [P2, medium] Sort out why
IConversationneeds two generic parameters - [P2, medium] Simplify
TTermToSemanticRefIndexgeneric parameter - [P2, medium] Tighten types: several places allow
Noneand construct default instances; either disallowNoneor skip that functionality
- [P2, medium] Remove a bunch of
XxxDataTypedDicts that can be dealt with usingdeserialize_objectandserialize_object - [P2, small] Catch and report
DeserializationErrorbetter - [P2, medium] Look into whether Pydantic can do our (de)serialization (presumably faster?)
- [P1, large] Make coding style more uniform (e.g. docstrings)
- [P1, large] Reduce code size
- [P2, small] Avoid most inline imports
- [P2, medium] Break cycles by moving things to their own file if necessary
- [P2, medium] Unify or align or refactor
VectorBaseandEmbeddingIndex - [P2, medium] Address
TODOcomments (too numerous) - [P2, medium] Address
raise NotImplementedError("TODO")(five found) -- implement it
- [P3, medium] Change inconsistent module names (Claude uses different naming style)
- [P3, medium] Rewrite podcast parsing without regexes
- [P3, medium] Switch from Protocol to ABC
- [P3, medium] Reduce duplication between ingest_vtt.py and typeagent/transcripts/
- [P3, small] Rename
kplib.pyto something ending in_schema.py
- [P1, large] Add new tests for newly added classes/methods/functions
- [P2, medium] Review Copilot-generated tests for sanity and minimal mocking
- [P1, small] Document test/build/release process
- [P1, small] Document how to run evaluations (but don't share the data)
- [P2, large] Document low-level APIs (key parts used directly by high-level APIs, e.g. ConversationSettings)
- [P3, small] Move
typeagentintosrc/ - [P3, tiny] Move
test/totests/