feat(composio): promote GitHub from catalog-only to native memory provider#2413
feat(composio): promote GitHub from catalog-only to native memory provider#2413justinhsu1477 wants to merge 1 commit into
Conversation
…vider Adds `GitHubProvider` next to the existing `gmail` / `notion` / `slack` / `clickup` providers, joining them as the fifth native memory-ingest provider in `composio/providers/`. Until now `github/mod.rs` declared itself "curated tool catalog only — no native ComposioProvider implementation yet"; this change closes that gap so the connected user's assigned GitHub issues stream into the Memory Tree on the periodic scheduler. Implementation mirrors the ClickUp (tinyhumansai#2291, merged) and Linear (tinyhumansai#2402, in review) providers 1:1 — same `SyncState` semantics, same `persist_single_item` ingest path, same daily-budget discipline, same "fetch-what-the-user-sees" assignee-scoped fetch. ## Sync model 1. SyncState load + daily budget gate. 2. Resolve viewer login via GITHUB_GET_AUTHENTICATED_USER. 3. Re-check budget after the probe (per CodeRabbit lesson on tinyhumansai#2291 / tinyhumansai#2402 — never burn a list call when the probe just spent the last budget slot). 4. Page through GITHUB_SEARCH_ISSUES with `is:issue assignee:<login> sort:updated-desc`. GitHub's search caps results at 1000 so pagination is naturally bounded; we also stop early on cursor boundary or short page. 5. Per issue, persist as one memory document via persist_single_item. Composite `issue_id@updated_at` dedup key re-syncs edited issues. 6. Advance cursor to newest updated_at, record last_sync_at_ms, save state. Transport-error path in the pagination loop persists state before returning the error (CodeRabbit lesson on tinyhumansai#2402: a flap mid-pagination must not roll back budget accounting). ## Privacy posture `assignee:<viewer_login>` is constructed inside the provider — never accepted from a caller — so the privacy boundary can't be tunnelled around. Matches the discipline gmail / notion / clickup / linear already follow. ## Source-id convention `composio-github-issue-<global_id>`. GitHub's `id` is globally unique across all of GitHub so it's a stable upsert key. Document title surfaces the canonical `owner/repo#number` form (e.g. `GitHub tinyhumansai#2408: …`) so search hits read the way contributors refer to issues in conversation. ## Curated tool catalog `GITHUB_CURATED` (already in `github/tools.rs` since the catalog-only era) is unchanged — `GITHUB_GET_AUTHENTICATED_USER` and `GITHUB_SEARCH_ISSUES` were already curated, which is the minimum the sync path needs. No new actions added. ## Files Added: - composio/providers/github/provider.rs — GitHubProvider impl (~415) - composio/providers/github/sync.rs — payload helpers (~360) - composio/providers/github/tests.rs — 18 unit tests (~165) Modified: - composio/providers/github/mod.rs — declare new modules + re-export provider; remove stale "no native impl yet" disclaimer - composio/providers/mod.rs — has_native_provider arm + native_provider_sync_interval arm + new regression test `capability_matrix_includes_github_as_native_memory_provider` (catalog_for_toolkit already pointed at github::GITHUB_CURATED; no change needed there) - composio/providers/registry.rs — register_provider in init_default_providers - composio/providers/descriptions.rs — GitHub description updated to mention Memory Tree sync ## Verification - cargo check --lib clean (pre-existing warnings only) - cargo test --lib composio::providers::github 40/40 pass - cargo test --lib composio::providers 305/305 pass (no regression on gmail / notion / slack / clickup) - cargo test --lib capability_matrix_includes_github 1/1 pass - cargo fmt --check clean - cargo clippy --lib --no-deps no new warnings in github/ Closes tinyhumansai#2408
📝 WalkthroughWalkthroughThis PR implements a complete GitHub provider for the Composio integration layer that enables incremental syncing of issues assigned to the authenticated user into Memory Tree. It includes JSON response normalization utilities, cursor-based pagination with daily request budgeting, persistent sync state management, and comprehensive test coverage, integrated into the global provider registry and capability matrix. ChangesGitHub Provider Core
Sequence DiagramsequenceDiagram
participant Client
participant GitHubProvider
participant SyncState
participant ComposioApi as Composio API
participant MemoryTree
Client->>GitHubProvider: sync(reason)
GitHubProvider->>SyncState: load persistent state
GitHubProvider->>ComposioApi: GITHUB_GET_AUTHENTICATED_USER
ComposioApi-->>GitHubProvider: viewer {login, name, email, ...}
GitHubProvider->>GitHubProvider: check daily budget
loop paginate through issues
GitHubProvider->>ComposioApi: GITHUB_SEARCH_ISSUES (assignee:viewer, page)
ComposioApi-->>GitHubProvider: {items: [...]}
GitHubProvider->>GitHubProvider: extract & deduplicate issues
GitHubProvider->>MemoryTree: persist unsynced issue documents
GitHubProvider->>SyncState: mark items synced
end
GitHubProvider->>SyncState: advance cursor to newest updated_at
GitHubProvider->>SyncState: save state
GitHubProvider-->>Client: SyncOutcome {fetched, persisted, cursor, budget}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/composio/providers/github/mod.rs (1)
24-25: ⚡ Quick winPrefer
*_test.rswiring for extracted module tests.Since tests were extracted, wire them via a sibling
*_test.rsfile to match the repository convention.♻️ Suggested update
#[cfg(test)] -mod tests; +#[path = "github_test.rs"] +mod tests;Also rename
src/openhuman/composio/providers/github/tests.rstosrc/openhuman/composio/providers/github/github_test.rs.As per coding guidelines: “When extracting Rust tests out of an implementation file, prefer a sibling
*_test.rsfile wired in with#[cfg(test)] #[path = "..._test.rs"] mod tests;.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/composio/providers/github/mod.rs` around lines 24 - 25, Replace the current inline test module wiring (the existing #[cfg(test)] mod tests;) so it points to a sibling test file using a path attribute and rename the extracted tests file accordingly: create/rename src/openhuman/composio/providers/github/tests.rs to github_test.rs and change the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod tests; so the tests are loaded from the sibling github_test.rs file instead of inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/composio/providers/github/mod.rs`:
- Around line 24-25: Replace the current inline test module wiring (the existing
#[cfg(test)] mod tests;) so it points to a sibling test file using a path
attribute and rename the extracted tests file accordingly: create/rename
src/openhuman/composio/providers/github/tests.rs to github_test.rs and change
the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod
tests; so the tests are loaded from the sibling github_test.rs file instead of
inline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4551cde9-34d5-4fa2-b112-369040495c1a
📒 Files selected for processing (7)
src/openhuman/composio/providers/descriptions.rssrc/openhuman/composio/providers/github/mod.rssrc/openhuman/composio/providers/github/provider.rssrc/openhuman/composio/providers/github/sync.rssrc/openhuman/composio/providers/github/tests.rssrc/openhuman/composio/providers/mod.rssrc/openhuman/composio/providers/registry.rs
|
Status note for anyone scanning this surface:
If anyone has a parallel GitHub-provider implementation in progress or planned, please flag in #2408 first so we can coordinate scope split. |
Summary
GitHubProvidernext to the existinggmail/notion/slack/clickupproviders — GitHub becomes the fifth native memory-ingest provider incomposio/providers/. Until nowgithub/mod.rsself-declared "curated tool catalog only — no native ComposioProvider implementation yet"; this PR closes that gap.SyncStatesemantics, samepersist_single_itemingest path, same daily-budget discipline, same "fetch-what-the-user-sees" assignee-scoped fetch.assignee:<viewer_login>qualifier is constructed inside the provider — never accepted from a caller — so the boundary can't be tunnelled around.Problem
composio/providers/already has working memory-ingest providers for gmail, notion, slack, clickup, and (pending merge of #2402) linear.github/was structurally one level lower — onlytools.rspopulated, noComposioProviderimpl. For an open-source AI assistant whose own contributors live on GitHub, that's the highest-leverage missing provider: every maintainer / reviewer / external contributor can dogfood it within minutes of merge.Solution
Promote
composio/providers/github/to a full provider by adding the three companion files that the other native providers all have:Plus five registration touchpoints (same shape ClickUp landed in #2291 and Linear in #2402):
composio/providers/mod.rs::has_native_provider: add"github".composio/providers/mod.rs::native_provider_sync_interval: add"github" => ….composio/providers/mod.rs::catalog_for_toolkit: no change — already points togithub::GITHUB_CURATED.composio/providers/registry.rs::init_default_providers: addregister_provider(Arc::new(GitHubProvider::new())).composio/providers/descriptions.rs: update GitHub description to mention Memory Tree sync.Sync model (mirroring ClickUp / Linear)
SyncState::load("github", connection_id)from the shared KV store.DEFAULT_DAILY_REQUEST_LIMIT = 500).GITHUB_GET_AUTHENTICATED_USER— needed for theassignee:<login>search filter.GITHUB_SEARCH_ISSUESwith queryis:issue assignee:<viewer_login> sort:updated-desc. GitHub's Search API caps results at 1000 so pagination is naturally bounded; we also stop early on cursor boundary or short page (< per_page).persist_single_item. Dedupe by compositeissue_id@updated_atso edits re-ingest.updated_at, recordlast_sync_at_ms, save state.Transport-error path in the pagination loop persists state before propagating the error (CodeRabbit's other lesson from #2402: a flap mid-pagination must not roll back budget accounting).
Source-id convention
composio-github-issue-<global_id>— GitHub'sidis globally unique across all of GitHub so it's a stable upsert key. Document title surfaces the canonicalowner/repo#numberform (e.g.GitHub tinyhumansai/openhuman#2408: …) so search hits read the way contributors refer to issues in conversation.Curated tool catalog
GITHUB_CURATED(already ingithub/tools.rsfrom the catalog-only era) is unchanged.GITHUB_GET_AUTHENTICATED_USERandGITHUB_SEARCH_ISSUESwere already curated — the minimum the sync path needs — so this PR adds zero new tool actions.Submission Checklist
owner/repo#numbercomposition from bothrepository.full_nameandrepository_urlshapes, viewer-login strict extractor that refuses to fall back to non-login fields, viewer-id metadata extraction), trait metadata stability, capability matrix registration (capability_matrix_includes_github_as_native_memory_provider), anddefault_impl_matches_new(observable equivalence — not a no-op test).sync()async happy path (covered behind a ComposioProviderContextthe existing test harness doesn't stand up — same as gmail / notion / slack / clickup / linear tests don't exercise the livesync()end-to-end either). Helper layer is unit-tested directly.Closes #2408in## Related.Impact
SyncStateKV namespaces (composio-sync-statekeyed by(toolkit, connection_id)), their registered tool catalogs, and all catalog-only toolkits are unchanged. The github tool catalog is unchanged — only adds the missing provider impl around it.MAX_PAGES_PER_SYNC = 20,PAGE_SIZE = 50steady-state (100for the initial backfill),DailyBudget = 500 req/day, and GitHub's own 1000-result Search cap. Per sync pass: 1 viewer probe + up to 10 search pages.assignee:<viewer_login>) prevents accidental ingest of other contributors' issues. Viewer-login extractor is strict (no fallback toid/namefields) so the filter can't silently scope to the wrong identity. Composio handles credentials; no new secret-handling code.Related
composio/providers/clickup/(feat(composio): add ClickUp provider for Memory Tree ingest #2291, merged),composio/providers/linear/(feat(composio): add Linear provider for Memory Tree ingest #2402, in review).composio/providers/traits.rs.composio/providers/sync_state.rs.AI Authored PR Metadata
Linear Issue
Commit & Branch
feat/github-memory-providerupstream/mainat fetch timeValidation Run
pnpm --filter openhuman-app format:check— Rust-only change.pnpm typecheck— Rust-only change.cargo test --lib composio::providers::github(40/40 pass — combines 18 intests.rsand 22 insync.rsinline tests);cargo test --lib composio::providers(305/305 pass — no regression on gmail / notion / slack / clickup or any catalog-only toolkit).cargo fmt --checkclean;cargo check --libclean (pre-existing warnings only);cargo clippy --lib --no-depsno new warnings incomposio/providers/github/.app/src-tauri/src/**changes.Validation Blocked
Behavior Changes
ConnectionCreatedhook.Parity Contract
SyncStateKV namespaces are unchanged.GITHUB_CURATED(already ingithub/tools.rs) is byte-identical — no new tool actions added or removed.ComposioProvidertrait contract — daily budget, dedup-by-id, cursor-based pagination, idempotentpersist_single_itemupserts.catalog_for_toolkit("github")continues to returngithub::GITHUB_CURATED(was already wired that way pre-PR; only the trait-impl side ofgithub/is new).Duplicate / Superseded PR Handling
GitHubProviderimpl.Summary by CodeRabbit
Release Notes