Skip to content

feat(composio): promote GitHub from catalog-only to native memory provider#2413

Open
justinhsu1477 wants to merge 1 commit into
tinyhumansai:mainfrom
justinhsu1477:feat/github-memory-provider
Open

feat(composio): promote GitHub from catalog-only to native memory provider#2413
justinhsu1477 wants to merge 1 commit into
tinyhumansai:mainfrom
justinhsu1477:feat/github-memory-provider

Conversation

@justinhsu1477
Copy link
Copy Markdown
Contributor

@justinhsu1477 justinhsu1477 commented May 21, 2026

Summary

  • Adds GitHubProvider next to the existing gmail / notion / slack / clickup providers — GitHub becomes the fifth native memory-ingest provider in composio/providers/. Until now github/mod.rs self-declared "curated tool catalog only — no native ComposioProvider implementation yet"; this PR closes that gap.
  • Implementation mirrors the ClickUp (feat(composio): add ClickUp provider for Memory Tree ingest #2291, merged) and Linear (feat(composio): add Linear provider for Memory Tree ingest #2402, in review) providers 1:1 — same SyncState semantics, same persist_single_item ingest path, same daily-budget discipline, same "fetch-what-the-user-sees" assignee-scoped fetch.
  • Privacy posture: only issues where the connected user is the assignee are pulled, never the whole watched-repos issue graph. The assignee:<viewer_login> qualifier is constructed inside the provider — never accepted from a caller — so the boundary can't be tunnelled around.

Problem

composio/providers/ already has working memory-ingest providers for gmail, notion, slack, clickup, and (pending merge of #2402) linear. github/ was structurally one level lower — only tools.rs populated, no ComposioProvider impl. For an open-source AI assistant whose own contributors live on GitHub, that's the highest-leverage missing provider: every maintainer / reviewer / external contributor can dogfood it within minutes of merge.

Solution

Promote composio/providers/github/ to a full provider by adding the three companion files that the other native providers all have:

src/openhuman/composio/providers/github/
  mod.rs        UPDATED  — declares provider/sync/tests modules,
                           drops "no native impl yet" disclaimer
  tools.rs      KEEP     — GITHUB_CURATED unchanged (already includes
                           GITHUB_GET_AUTHENTICATED_USER and
                           GITHUB_SEARCH_ISSUES which is all the sync
                           path needs; no new actions added)
  provider.rs   NEW      — impl ComposioProvider for GitHubProvider
  sync.rs       NEW      — payload helpers (extract_issues,
                           extract_issue_title, extract_issue_updated,
                           extract_repo_qualified_identifier,
                           extract_viewer_login, extract_viewer_id)
  tests.rs      NEW      — 18 unit tests

Plus five registration touchpoints (same shape ClickUp landed in #2291 and Linear in #2402):

  1. composio/providers/mod.rs::has_native_provider: add "github".
  2. composio/providers/mod.rs::native_provider_sync_interval: add "github" => ….
  3. composio/providers/mod.rs::catalog_for_toolkit: no change — already points to github::GITHUB_CURATED.
  4. composio/providers/registry.rs::init_default_providers: add register_provider(Arc::new(GitHubProvider::new())).
  5. composio/providers/descriptions.rs: update GitHub description to mention Memory Tree sync.

Sync model (mirroring ClickUp / Linear)

  1. SyncState::load("github", connection_id) from the shared KV store.
  2. Daily request budget check (DEFAULT_DAILY_REQUEST_LIMIT = 500).
  3. Resolve viewer login via GITHUB_GET_AUTHENTICATED_USER — needed for the assignee:<login> search filter.
  4. Re-check budget between viewer probe and search call (CodeRabbit lesson from feat(composio): add ClickUp provider for Memory Tree ingest #2291 / feat(composio): add Linear provider for Memory Tree ingest #2402 — never burn a list call when the probe just spent the last budget slot).
  5. Page through GITHUB_SEARCH_ISSUES with query is:issue assignee:<viewer_login> sort:updated-desc. GitHub's Search API caps results at 1000 so pagination is naturally bounded; we also stop early on cursor boundary or short page (< per_page).
  6. Per issue, persist as one memory document via persist_single_item. Dedupe by composite issue_id@updated_at so edits re-ingest.
  7. Advance cursor to newest updated_at, record last_sync_at_ms, save state.

Transport-error path in the pagination loop persists state before propagating the error (CodeRabbit's other lesson from #2402: a flap mid-pagination must not roll back budget accounting).

Source-id convention

composio-github-issue-<global_id> — GitHub's id is globally unique across all of GitHub so it's a stable upsert key. Document title surfaces the canonical owner/repo#number form (e.g. GitHub tinyhumansai/openhuman#2408: …) so search hits read the way contributors refer to issues in conversation.

Curated tool catalog

GITHUB_CURATED (already in github/tools.rs from the catalog-only era) is unchanged. GITHUB_GET_AUTHENTICATED_USER and GITHUB_SEARCH_ISSUES were already curated — the minimum the sync path needs — so this PR adds zero new tool actions.

Submission Checklist

  • Tests added or updated — 18 new unit tests cover sync helpers (results across SEARCH/LIST envelope shapes, title / updated extraction, owner/repo#number composition from both repository.full_name and repository_url shapes, viewer-login strict extractor that refuses to fall back to non-login fields, viewer-id metadata extraction), trait metadata stability, capability matrix registration (capability_matrix_includes_github_as_native_memory_provider), and default_impl_matches_new (observable equivalence — not a no-op test).
  • Diff coverage ≥ 80% — new code is overwhelmingly the sync() async happy path (covered behind a Composio ProviderContext the existing test harness doesn't stand up — same as gmail / notion / slack / clickup / linear tests don't exercise the live sync() end-to-end either). Helper layer is unit-tested directly.
  • N/A: Coverage matrix updated — extends the existing "Composio memory provider" capability row; no new matrix feature row.
  • N/A: All affected feature IDs from the matrix are listed — extending an existing capability.
  • No new external network dependencies introduced — all GitHub API access goes through the existing Composio backend / direct client.
  • N/A: Manual smoke checklist updated — no release-cut surface changes; new ingest path is feature-flagged behind "user has a GitHub Composio connection".
  • Linked issue closed via Closes #2408 in ## Related.

Impact

  • Runtime/platform impact: desktop core only (Rust). No Tauri shell, no frontend changes.
  • Compatibility impact: strictly additive. Existing providers, their SyncState KV namespaces (composio-sync-state keyed by (toolkit, connection_id)), their registered tool catalogs, and all catalog-only toolkits are unchanged. The github tool catalog is unchanged — only adds the missing provider impl around it.
  • Performance impact: bounded — MAX_PAGES_PER_SYNC = 20, PAGE_SIZE = 50 steady-state (100 for the initial backfill), DailyBudget = 500 req/day, and GitHub's own 1000-result Search cap. Per sync pass: 1 viewer probe + up to 10 search pages.
  • Security impact: assignee-scoped fetch (assignee:<viewer_login>) prevents accidental ingest of other contributors' issues. Viewer-login extractor is strict (no fallback to id / name fields) so the filter can't silently scope to the wrong identity. Composio handles credentials; no new secret-handling code.

Related


AI Authored PR Metadata

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/github-memory-provider
  • Commit SHA: (latest on the branch)
  • Base: upstream/main at fetch time

Validation Run

  • N/A: pnpm --filter openhuman-app format:check — Rust-only change.
  • N/A: pnpm typecheck — Rust-only change.
  • Focused tests: cargo test --lib composio::providers::github (40/40 pass — combines 18 in tests.rs and 22 in sync.rs inline tests); cargo test --lib composio::providers (305/305 pass — no regression on gmail / notion / slack / clickup or any catalog-only toolkit).
  • Rust fmt/check: cargo fmt --check clean; cargo check --lib clean (pre-existing warnings only); cargo clippy --lib --no-deps no new warnings in composio/providers/github/.
  • N/A: Tauri fmt/check — no app/src-tauri/src/** changes.

Validation Blocked

  • N/A

Behavior Changes

  • Intended behavior change: users with a Composio-connected GitHub account now have their assigned issues periodically ingested into the Memory Tree on the existing 30-minute scheduler cadence, with initial backfill triggered by the ConnectionCreated hook.
  • User-visible effect: GitHub issue content (title, body, state, labels, assignees, repo, created_at, updated_at as JSON) becomes available to the agent and retrieval layer the same way Gmail / Notion / Slack / ClickUp content already is.

Parity Contract

  • Legacy behavior preserved: existing gmail / notion / slack / clickup providers are completely untouched. Their SyncState KV namespaces are unchanged. GITHUB_CURATED (already in github/tools.rs) is byte-identical — no new tool actions added or removed.
  • Guard/fallback/dispatch parity checks: provider follows the existing ComposioProvider trait contract — daily budget, dedup-by-id, cursor-based pagination, idempotent persist_single_item upserts.
  • catalog_for_toolkit("github") continues to return github::GITHUB_CURATED (was already wired that way pre-PR; only the trait-impl side of github/ is new).

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Release Notes

  • New Features
    • GitHub provider integration now available as a native memory provider with incremental issue synchronization
    • Automatic sync of assigned GitHub issues into Memory Tree with deduplication and progress tracking
    • GitHub user profile retrieval with viewer authentication support
    • Curated GitHub tools catalog and configurable sync intervals

Review Change Stack

…vider

Adds `GitHubProvider` next to the existing `gmail` / `notion` / `slack` /
`clickup` providers, joining them as the fifth native memory-ingest
provider in `composio/providers/`. Until now `github/mod.rs` declared
itself "curated tool catalog only — no native ComposioProvider
implementation yet"; this change closes that gap so the connected
user's assigned GitHub issues stream into the Memory Tree on the
periodic scheduler.

Implementation mirrors the ClickUp (tinyhumansai#2291, merged) and Linear (tinyhumansai#2402,
in review) providers 1:1 — same `SyncState` semantics, same
`persist_single_item` ingest path, same daily-budget discipline, same
"fetch-what-the-user-sees" assignee-scoped fetch.

## Sync model

  1. SyncState load + daily budget gate.
  2. Resolve viewer login via GITHUB_GET_AUTHENTICATED_USER.
  3. Re-check budget after the probe (per CodeRabbit lesson on
     tinyhumansai#2291 / tinyhumansai#2402 — never burn a list call when the probe just
     spent the last budget slot).
  4. Page through GITHUB_SEARCH_ISSUES with
     `is:issue assignee:<login> sort:updated-desc`. GitHub's search
     caps results at 1000 so pagination is naturally bounded; we
     also stop early on cursor boundary or short page.
  5. Per issue, persist as one memory document via
     persist_single_item. Composite `issue_id@updated_at` dedup key
     re-syncs edited issues.
  6. Advance cursor to newest updated_at, record last_sync_at_ms,
     save state.

Transport-error path in the pagination loop persists state before
returning the error (CodeRabbit lesson on tinyhumansai#2402: a flap mid-pagination
must not roll back budget accounting).

## Privacy posture

`assignee:<viewer_login>` is constructed inside the provider —
never accepted from a caller — so the privacy boundary can't be
tunnelled around. Matches the discipline gmail / notion / clickup /
linear already follow.

## Source-id convention

`composio-github-issue-<global_id>`. GitHub's `id` is globally unique
across all of GitHub so it's a stable upsert key. Document title
surfaces the canonical `owner/repo#number` form (e.g.
`GitHub tinyhumansai#2408: …`) so search hits read the way
contributors refer to issues in conversation.

## Curated tool catalog

`GITHUB_CURATED` (already in `github/tools.rs` since the catalog-only
era) is unchanged — `GITHUB_GET_AUTHENTICATED_USER` and
`GITHUB_SEARCH_ISSUES` were already curated, which is the minimum the
sync path needs. No new actions added.

## Files

Added:
  - composio/providers/github/provider.rs   — GitHubProvider impl (~415)
  - composio/providers/github/sync.rs       — payload helpers (~360)
  - composio/providers/github/tests.rs      — 18 unit tests (~165)

Modified:
  - composio/providers/github/mod.rs        — declare new modules +
                                              re-export provider;
                                              remove stale
                                              "no native impl yet"
                                              disclaimer
  - composio/providers/mod.rs               — has_native_provider arm +
                                              native_provider_sync_interval
                                              arm + new regression test
                                              `capability_matrix_includes_github_as_native_memory_provider`
                                              (catalog_for_toolkit
                                              already pointed at
                                              github::GITHUB_CURATED;
                                              no change needed there)
  - composio/providers/registry.rs          — register_provider in
                                              init_default_providers
  - composio/providers/descriptions.rs      — GitHub description
                                              updated to mention
                                              Memory Tree sync

## Verification

  - cargo check --lib clean (pre-existing warnings only)
  - cargo test --lib composio::providers::github  40/40 pass
  - cargo test --lib composio::providers          305/305 pass
    (no regression on gmail / notion / slack / clickup)
  - cargo test --lib capability_matrix_includes_github  1/1 pass
  - cargo fmt --check clean
  - cargo clippy --lib --no-deps no new warnings in github/

Closes tinyhumansai#2408
@justinhsu1477 justinhsu1477 requested a review from a team May 21, 2026 05:31
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

📝 Walkthrough

Walkthrough

This PR implements a complete GitHub provider for the Composio integration layer that enables incremental syncing of issues assigned to the authenticated user into Memory Tree. It includes JSON response normalization utilities, cursor-based pagination with daily request budgeting, persistent sync state management, and comprehensive test coverage, integrated into the global provider registry and capability matrix.

Changes

GitHub Provider Core

Layer / File(s) Summary
Sync helper functions and JSON extractors
src/openhuman/composio/providers/github/sync.rs
Adds utility functions to normalize Composio-wrapped REST/GraphQL responses: extract_issues probes multiple envelope paths; extract_issue_title/extract_issue_updated support wrapped and camelCase variants; extract_repo_qualified_identifier constructs owner/repo#number from multiple field sources; extract_viewer_login/extract_viewer_id enforce strict path validation; now_ms() returns epoch milliseconds.
GitHubProvider type and metadata
src/openhuman/composio/providers/github/provider.rs (lines 1–73)
Defines GitHubProvider struct with action constants, pagination limits, and issue ID paths; implements new(), Default, and metadata trait methods (toolkit_slug, sync_interval_secs, curated_tools).
User profile and viewer resolution
src/openhuman/composio/providers/github/provider.rs (lines 74–497)
Implements fetch_user_profile to extract viewer login/name/email/avatar; includes resolve_viewer_login helper that records budget usage and validates viewer presence.
Incremental issue sync with pagination and state
src/openhuman/composio/providers/github/provider.rs (lines 159–462)
Implements ComposioProvider::sync with cursor-based pagination, daily request budgeting, composite deduplication key (issue ID + updated_at), persistent SyncState management, and SyncOutcome telemetry.
Module structure and registry integration
src/openhuman/composio/providers/github/mod.rs, src/openhuman/composio/providers/descriptions.rs, src/openhuman/composio/providers/mod.rs, src/openhuman/composio/providers/registry.rs
Updates module layout to declare provider/sync and re-export GitHubProvider; adds GitHub to native-provider dispatch and description; registers GitHubProvider in global startup.
Test coverage for extractors and provider
src/openhuman/composio/providers/github/sync.rs (lines 181–347), src/openhuman/composio/providers/github/tests.rs, src/openhuman/composio/providers/mod.rs (lines 358–386)
Unit tests for JSON extractors across envelope variants and edge cases; metadata stability tests for GitHubProvider (slug, sync interval, curated tools); capability matrix integration test verifying GitHub as native provider with correct sync interval and memory ingest flags.

Sequence Diagram

sequenceDiagram
  participant Client
  participant GitHubProvider
  participant SyncState
  participant ComposioApi as Composio API
  participant MemoryTree
  
  Client->>GitHubProvider: sync(reason)
  GitHubProvider->>SyncState: load persistent state
  GitHubProvider->>ComposioApi: GITHUB_GET_AUTHENTICATED_USER
  ComposioApi-->>GitHubProvider: viewer {login, name, email, ...}
  GitHubProvider->>GitHubProvider: check daily budget
  
  loop paginate through issues
    GitHubProvider->>ComposioApi: GITHUB_SEARCH_ISSUES (assignee:viewer, page)
    ComposioApi-->>GitHubProvider: {items: [...]}
    GitHubProvider->>GitHubProvider: extract & deduplicate issues
    GitHubProvider->>MemoryTree: persist unsynced issue documents
    GitHubProvider->>SyncState: mark items synced
  end
  
  GitHubProvider->>SyncState: advance cursor to newest updated_at
  GitHubProvider->>SyncState: save state
  GitHubProvider-->>Client: SyncOutcome {fetched, persisted, cursor, budget}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • #2408: The PR implements the exact GitHub provider code changes requested—new provider.rs, sync.rs, tests, and registry/native-provider integration with toolkit slug and sync interval configuration.

Suggested labels

working

Suggested reviewers

  • senamakel

Poem

🐰 A rabbit hops through GitHub's trees,
syncing issues with ease and breeze,
each assigned task now finds its place,
in Memory's safe and ordered space—
incremental magic, cursor-traced! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: promoting GitHub from a catalog-only toolkit to a full native memory provider with sync capabilities.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/composio/providers/github/mod.rs (1)

24-25: ⚡ Quick win

Prefer *_test.rs wiring for extracted module tests.

Since tests were extracted, wire them via a sibling *_test.rs file to match the repository convention.

♻️ Suggested update
 #[cfg(test)]
-mod tests;
+#[path = "github_test.rs"]
+mod tests;

Also rename src/openhuman/composio/providers/github/tests.rs to src/openhuman/composio/providers/github/github_test.rs.

As per coding guidelines: “When extracting Rust tests out of an implementation file, prefer a sibling *_test.rs file wired in with #[cfg(test)] #[path = "..._test.rs"] mod tests;.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/composio/providers/github/mod.rs` around lines 24 - 25, Replace
the current inline test module wiring (the existing #[cfg(test)] mod tests;) so
it points to a sibling test file using a path attribute and rename the extracted
tests file accordingly: create/rename
src/openhuman/composio/providers/github/tests.rs to github_test.rs and change
the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod
tests; so the tests are loaded from the sibling github_test.rs file instead of
inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/composio/providers/github/mod.rs`:
- Around line 24-25: Replace the current inline test module wiring (the existing
#[cfg(test)] mod tests;) so it points to a sibling test file using a path
attribute and rename the extracted tests file accordingly: create/rename
src/openhuman/composio/providers/github/tests.rs to github_test.rs and change
the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod
tests; so the tests are loaded from the sibling github_test.rs file instead of
inline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4551cde9-34d5-4fa2-b112-369040495c1a

📥 Commits

Reviewing files that changed from the base of the PR and between 6281aea and b89790c.

📒 Files selected for processing (7)
  • src/openhuman/composio/providers/descriptions.rs
  • src/openhuman/composio/providers/github/mod.rs
  • src/openhuman/composio/providers/github/provider.rs
  • src/openhuman/composio/providers/github/sync.rs
  • src/openhuman/composio/providers/github/tests.rs
  • src/openhuman/composio/providers/mod.rs
  • src/openhuman/composio/providers/registry.rs

@justinhsu1477
Copy link
Copy Markdown
Contributor Author

Status note for anyone scanning this surface:

If anyone has a parallel GitHub-provider implementation in progress or planned, please flag in #2408 first so we can coordinate scope split.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GitHub as a Composio memory provider (joining gmail / notion / slack / clickup / linear)

1 participant