Skip to content

feat: add windsurf-trajectory-extractor#123

Merged
wey-gu merged 2 commits intonowledge-co:mainfrom
jijiamoer:feat/windsurf-trajectory-extractor
Mar 17, 2026
Merged

feat: add windsurf-trajectory-extractor#123
wey-gu merged 2 commits intonowledge-co:mainfrom
jijiamoer:feat/windsurf-trajectory-extractor

Conversation

@jijiamoer
Copy link
Copy Markdown
Contributor

@jijiamoer jijiamoer commented Mar 13, 2026

Summary

A Python CLI tool for extracting Windsurf Cascade conversation trajectories with deep protobuf decoding.

Features

  • Thinking content extraction - Access internal reasoning that's not visible in the UI
  • Microsecond-precision timestamps - From protobuf, not JSON approximations
  • Complete tool call parameters - Full tool invocation details
  • Provider information - Model provider metadata
  • Cross-platform support - macOS, Linux, Windows

Technical Highlights

  • Pure Python standard library (no external dependencies)
  • Reverse-engineered protobuf structure for deep extraction
  • Supports both Windsurf and Windsurf - Next installations

Differentiator

Unlike JSON-based extraction tools (e.g., ai-data-extraction), this performs protobuf decoding to access:

  • Thinking content that JSON methods cannot extract
  • Microsecond-precision timestamps
  • Complete step metadata

Usage

# Install
pip install -e .

# List workspaces
windsurf-trajectory --list

# Extract trajectory
windsurf-trajectory -w WORKSPACE_ID -o trajectory.jsonl

Files

  • src/windsurf_trajectory/extractor.py - Core extraction logic (~480 lines)
  • src/windsurf_trajectory/cli.py - CLI interface
  • examples/sample_output.jsonl - Sample output format

Checklist

  • Code follows project conventions
  • README with usage instructions
  • MIT License
  • pyproject.toml for packaging
  • Example output included
  • Passed ruff check/format

Summary by CodeRabbit

  • New Features

    • Added two new trajectory extractor integrations: Windsurf Trajectory Extractor for conversation history and Antigravity Trajectory Extractor for conversation trajectories.
  • Documentation

    • Updated documentation with information about the new integrations, including installation instructions and descriptions.

A Python CLI tool for extracting Windsurf Cascade conversation trajectories
with deep protobuf decoding.

Features:
- Thinking content extraction (internal reasoning, not visible in UI)
- Microsecond-precision timestamps from protobuf
- Complete tool call parameters
- Provider information
- Cross-platform support (macOS, Linux, Windows)

Technical highlights:
- Pure Python standard library (no external dependencies)
- Reverse-engineered protobuf structure for deep extraction
- Supports both 'Windsurf' and 'Windsurf - Next' installations

Differentiator from existing tools:
- Unlike JSON-based extraction, this performs protobuf decoding
- Extracts thinking content that JSON methods cannot access
- Provides microsecond-precision timestamps
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

The pull request adds two new git submodules (windsurf-trajectory-extractor and antigravity-trajectory-extractor) to the repository configuration and updates the README documentation to list these integrations with their respective GitHub URLs and descriptions.

Changes

Cohort / File(s) Summary
Submodule Configuration
.gitmodules, windsurf-trajectory-extractor, antigravity-trajectory-extractor
Adds two new submodule entries pointing to external repositories with specific commit references.
Documentation
README.md
Updates Integrations table with two new rows documenting the Windsurf and Antigravity trajectory extractors, including GitHub URLs and brief descriptions.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 Two new trails now join our warren,
Windsurf breeze and gravity's pardon,
With submodules linked and docs so fair,
The repo grows with utmost care! 🌬️✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title only mentions windsurf-trajectory-extractor, but the PR adds two submodules (windsurf-trajectory-extractor AND antigravity-trajectory-extractor) and updates the README for both. Update the title to reflect both submodules being added, such as: 'feat: add windsurf and antigravity trajectory extractors as submodules' or 'refactor: move trajectory extractors to submodules'
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (9)
windsurf-trajectory-extractor/.gitignore (1)

45-46: Consider narrowing the global *.jsonl ignore rule.

*.jsonl at repo scope may hide legitimate fixtures/docs added outside examples/ later. If the intent is only generated outputs, consider a more targeted pattern (or a clearer naming convention for generated files).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/.gitignore` around lines 45 - 46, The global
ignore entry "*.jsonl" is too broad and may hide non-generated fixtures; replace
it with a narrower pattern that targets generated outputs (for example a
directory-specific pattern or a naming convention) and keep the current
exception "!examples/*.jsonl" if examples should remain tracked; update the
.gitignore by removing the top-level "*.jsonl" and adding a more specific rule
such as a generated/ or outputs/ directory pattern (or a suffix like
"*.generated.jsonl") so only intended files are ignored while legitimate JSONL
assets outside examples remain visible.
windsurf-trajectory-extractor/README.md (2)

104-119: Add language specifier to protobuf structure block.

The protobuf structure documentation block lacks a language identifier.

📝 Proposed fix
-```
+```text
 Top-level:
   f1 (string): Trajectory UUID
   ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/README.md` around lines 104 - 119, The fenced
protobuf structure block starting with "Top-level:" (showing fields like f1, f2,
repeated Step, f20 etc.) lacks a language specifier; update the opening
triple-backtick to include a language tag (for example ```text) so the block
becomes a proper code block with a language identifier, leaving the block
contents unchanged and keeping the closing triple-backtick as-is.

96-98: Add language specifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified for proper syntax highlighting and accessibility.

📝 Proposed fix
-```
+```text
 ~/Library/Application Support/Windsurf - Next/User/globalStorage/state.vscdb
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @windsurf-trajectory-extractor/README.md around lines 96 - 98, The fenced
code block in README.md containing the path ~/Library/Application Support/Windsurf - Next/User/globalStorage/state.vscdb lacks a language
specifier; update that code fence to include a language tag (e.g., use "text")
so the block starts with ```text to satisfy markdownlint and enable proper
highlighting/accessibility.


</details>

</blockquote></details>
<details>
<summary>windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py (5)</summary><blockquote>

`479-480`: **Consider logging or narrowing exception handling.**

The bare `except: pass` silently swallows all errors during keyword search. While this provides resilience, it could hide issues like malformed data. Consider at minimum catching `Exception` or logging errors in debug mode.



<details>
<summary>♻️ Proposed fix</summary>

```diff
-        except Exception:
-            pass
+        except (ValueError, UnicodeDecodeError, KeyError):
+            continue
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py` around
lines 479 - 480, Replace the bare "except: pass" in the keyword-search block
with a narrowed exception handler that captures Exception as e and logs the
error instead of silencing it; specifically, in the keyword search logic inside
windsuf_trajectory/extractor.py (the try/except around the keyword matching
code) change to "except Exception as e:" and call the module logger (e.g.,
logger.debug(...) or logger.exception(...)) with context about the record and
the exception, or re-raise for non-recoverable errors if appropriate, so
malformed data/errors aren’t silently dropped.
```

</details>

---

`253-259`: **Size calculation is inconsistent with other functions.**

`len(str(v))` gives the character count of the JSON-encoded value, while `extract_trajectory` uses `len(blob)` (actual decoded bytes). This could lead to confusing size discrepancies between `--list` and extract output.



<details>
<summary>♻️ Proposed fix for consistent sizing</summary>

```diff
         if "cachedActiveTrajectory" in k:
             ws_id = k.split(":")[-1]
+            try:
+                blob = base64.b64decode(v)
+                size = len(blob)
+            except Exception:
+                size = len(str(v))
             workspaces.append(
                 {
                     "id": ws_id,
-                    "size": len(str(v)),
+                    "size": size,
                     "path": workspace_name(ws_storage, ws_id),
                 }
             )
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py` around
lines 253 - 259, The listed workspace "size" is computed with len(str(v)) which
measures JSON text length and mismatches extract_trajectory's len(blob); change
the computation in the workspaces.append block to compute the actual decoded
blob byte length the same way extract_trajectory does (decode/deserialize v to
the binary blob and use len(blob)) so sizes reported by the listing match sizes
produced by extract_trajectory; look at the variables v, workspaces.append and
the extract_trajectory sizing logic to mirror its decoding and length call.
```

</details>

---

`219-223`: **Timestamp range may become a "time bomb" in 2040.**

The comment acknowledges this, but the upper bound of 2208988800 (Jan 2040) means this tool will reject valid timestamps after that date. Consider either removing the upper bound or extending it significantly.



<details>
<summary>♻️ Proposed fix</summary>

```diff
     # Sanity check: timestamp should be reasonable (2020-2100)
-    if 1577836800 < seconds < 2208988800:
+    if 1577836800 < seconds < 4102444800:  # 2020-2100
         return datetime.fromtimestamp(seconds + nanos / 1e9, tz=tz)
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py` around
lines 219 - 223, The timestamp sanity check in extractor.py that returns None
for seconds outside (1577836800, 2208988800) will reject valid post-2040
timestamps; update the logic in the block handling seconds/nanos before
datetime.fromtimestamp (the variables named seconds and nanos and the call to
datetime.fromtimestamp(..., tz=tz)) to either remove the upper bound entirely
and only guard against obviously invalid values (e.g., negative seconds) or
extend the upper bound to a far-future value (e.g., year 2100+), and ensure the
function's return None behavior remains consistent when timestamps are truly
invalid.
```

</details>

---

`36-37`: **Hardcoded timezone may not suit all users.**

`DEFAULT_TZ` is set to CST (UTC+8). Consider either documenting this prominently or using the system's local timezone as default.



<details>
<summary>♻️ Option: Use local timezone</summary>

```diff
-DEFAULT_TZ = timezone(timedelta(hours=8))  # CST
+# Use local timezone by default
+DEFAULT_TZ = datetime.now().astimezone().tzinfo
```
</details>

Or simply document the CST default more prominently so users know to pass a custom `tz` parameter.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py` around
lines 36 - 37, DEFAULT_TZ is hardcoded to CST (UTC+8); change it to use the
system local timezone and ensure callers can still pass a custom tz: replace the
constant assignment with a runtime lookup (e.g., derive tz via
datetime.now().astimezone().tzinfo) and update any functions that reference
DEFAULT_TZ (look for references to DEFAULT_TZ and any function parameters named
tz) to default tz=None and resolve to the system tz when None; also update the
module docstring/comments to clearly state that the default is the system local
timezone and how to pass a custom tz.
```

</details>

---

`201-204`: **Consider narrowing exception handling.**

The bare `except Exception:` catches broadly. For defensive protobuf parsing this is often acceptable, but catching `Exception` instead of a bare `except:` would avoid swallowing `KeyboardInterrupt` and `SystemExit`.



<details>
<summary>♻️ Proposed fix</summary>

```diff
-        except Exception:
+        except (ValueError, struct.error, IndexError):
             break
```
</details>

Alternatively, if broad catching is intentional for unknown malformed data, the current approach is acceptable given the parsing context.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py` around
lines 201 - 204, Replace the broad bare except handler around the protobuf
parsing loop with an explicit "except Exception as e" handler (the except block
shown after the loop in windsurf_trajectory/extractor.py) so KeyboardInterrupt
and SystemExit are not swallowed; capture the exception into a variable (e) and
either log it or handle it appropriately, but avoid using a bare except: so only
subclass-of-Exception errors are caught.
```

</details>

</blockquote></details>
<details>
<summary>windsurf-trajectory-extractor/src/windsurf_trajectory/__init__.py (1)</summary><blockquote>

`1-7`: **Consider re-exporting public API for better ergonomics.**

The package root only exposes `__version__`. For library consumers, it would be more convenient to import directly from `windsurf_trajectory` rather than `windsurf_trajectory.extractor`.



<details>
<summary>♻️ Optional: Re-export public API</summary>

```diff
 """Windsurf Trajectory Extractor - Deep extraction of Cascade conversation history.
 
 This tool extracts complete trajectory data from Windsurf's internal storage,
 including thinking content, tool calls, and microsecond-precision timestamps.
 """
 
 __version__ = "0.1.0"
+
+from .extractor import (
+    DEFAULT_TZ,
+    extract_trajectory,
+    find_by_keywords,
+    find_windsurf_paths,
+    list_summaries,
+    list_workspaces,
+    load_codeium_state,
+    workspace_name,
+)
+
+__all__ = [
+    "__version__",
+    "DEFAULT_TZ",
+    "extract_trajectory",
+    "find_by_keywords",
+    "find_windsurf_paths",
+    "list_summaries",
+    "list_workspaces",
+    "load_codeium_state",
+    "workspace_name",
+]
```
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@windsurf-trajectory-extractor/src/windsurf_trajectory/__init__.py` around
lines 1 - 7, The package root currently only exposes __version__; to make
imports ergonomic, re-export the public API from windsurf_trajectory.extractor
by importing the extractor's public symbols (e.g., classes/functions like
whatever public names are defined in windsurf_trajectory.extractor) into
windsurf_trajectory.__init__ and adding them to __all__ (or expose the module as
extractor via "from . import extractor as extractor") so consumers can "from
windsurf_trajectory import <PublicName>" instead of importing from
windsurf_trajectory.extractor; keep __version__ and ensure types and docstrings
are preserved.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @windsurf-trajectory-extractor/.gitignore:

  • Around line 45-46: The global ignore entry ".jsonl" is too broad and may hide
    non-generated fixtures; replace it with a narrower pattern that targets
    generated outputs (for example a directory-specific pattern or a naming
    convention) and keep the current exception "!examples/
    .jsonl" if examples
    should remain tracked; update the .gitignore by removing the top-level ".jsonl"
    and adding a more specific rule such as a generated/ or outputs/ directory
    pattern (or a suffix like "
    .generated.jsonl") so only intended files are
    ignored while legitimate JSONL assets outside examples remain visible.

In @windsurf-trajectory-extractor/README.md:

  • Around line 104-119: The fenced protobuf structure block starting with
    "Top-level:" (showing fields like f1, f2, repeated Step, f20 etc.) lacks a
    language specifier; update the opening triple-backtick to include a language tag
    (for example ```text) so the block becomes a proper code block with a language
    identifier, leaving the block contents unchanged and keeping the closing
    triple-backtick as-is.
  • Around line 96-98: The fenced code block in README.md containing the path
    ~/Library/Application Support/Windsurf - Next/User/globalStorage/state.vscdb
    lacks a language specifier; update that code fence to include a language tag
    (e.g., use "text") so the block starts with ```text to satisfy markdownlint and
    enable proper highlighting/accessibility.

In @windsurf-trajectory-extractor/src/windsurf_trajectory/__init__.py:

  • Around line 1-7: The package root currently only exposes version; to make
    imports ergonomic, re-export the public API from windsurf_trajectory.extractor
    by importing the extractor's public symbols (e.g., classes/functions like
    whatever public names are defined in windsurf_trajectory.extractor) into
    windsurf_trajectory.init and adding them to all (or expose the module as
    extractor via "from . import extractor as extractor") so consumers can "from
    windsurf_trajectory import " instead of importing from
    windsurf_trajectory.extractor; keep version and ensure types and docstrings
    are preserved.

In @windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py:

  • Around line 479-480: Replace the bare "except: pass" in the keyword-search
    block with a narrowed exception handler that captures Exception as e and logs
    the error instead of silencing it; specifically, in the keyword search logic
    inside windsuf_trajectory/extractor.py (the try/except around the keyword
    matching code) change to "except Exception as e:" and call the module logger
    (e.g., logger.debug(...) or logger.exception(...)) with context about the record
    and the exception, or re-raise for non-recoverable errors if appropriate, so
    malformed data/errors aren’t silently dropped.
  • Around line 253-259: The listed workspace "size" is computed with len(str(v))
    which measures JSON text length and mismatches extract_trajectory's len(blob);
    change the computation in the workspaces.append block to compute the actual
    decoded blob byte length the same way extract_trajectory does
    (decode/deserialize v to the binary blob and use len(blob)) so sizes reported by
    the listing match sizes produced by extract_trajectory; look at the variables v,
    workspaces.append and the extract_trajectory sizing logic to mirror its decoding
    and length call.
  • Around line 219-223: The timestamp sanity check in extractor.py that returns
    None for seconds outside (1577836800, 2208988800) will reject valid post-2040
    timestamps; update the logic in the block handling seconds/nanos before
    datetime.fromtimestamp (the variables named seconds and nanos and the call to
    datetime.fromtimestamp(..., tz=tz)) to either remove the upper bound entirely
    and only guard against obviously invalid values (e.g., negative seconds) or
    extend the upper bound to a far-future value (e.g., year 2100+), and ensure the
    function's return None behavior remains consistent when timestamps are truly
    invalid.
  • Around line 36-37: DEFAULT_TZ is hardcoded to CST (UTC+8); change it to use
    the system local timezone and ensure callers can still pass a custom tz: replace
    the constant assignment with a runtime lookup (e.g., derive tz via
    datetime.now().astimezone().tzinfo) and update any functions that reference
    DEFAULT_TZ (look for references to DEFAULT_TZ and any function parameters named
    tz) to default tz=None and resolve to the system tz when None; also update the
    module docstring/comments to clearly state that the default is the system local
    timezone and how to pass a custom tz.
  • Around line 201-204: Replace the broad bare except handler around the protobuf
    parsing loop with an explicit "except Exception as e" handler (the except block
    shown after the loop in windsurf_trajectory/extractor.py) so KeyboardInterrupt
    and SystemExit are not swallowed; capture the exception into a variable (e) and
    either log it or handle it appropriately, but avoid using a bare except: so only
    subclass-of-Exception errors are caught.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `7aa7df9d-a8a7-4332-9c1a-c744523c84f8`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 58b78ef0ec134795b98a17033d2e5283f5d5a472 and a0f9cab1228a4fbc491644093a5cac47f1f372d5.

</details>

<details>
<summary>📒 Files selected for processing (8)</summary>

* `windsurf-trajectory-extractor/.gitignore`
* `windsurf-trajectory-extractor/LICENSE`
* `windsurf-trajectory-extractor/README.md`
* `windsurf-trajectory-extractor/examples/sample_output.jsonl`
* `windsurf-trajectory-extractor/pyproject.toml`
* `windsurf-trajectory-extractor/src/windsurf_trajectory/__init__.py`
* `windsurf-trajectory-extractor/src/windsurf_trajectory/cli.py`
* `windsurf-trajectory-extractor/src/windsurf_trajectory/extractor.py`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@wey-gu
Copy link
Copy Markdown
Member

wey-gu commented Mar 13, 2026

Thanks a lot for the amazing contribution!

@wey-gu
Copy link
Copy Markdown
Member

wey-gu commented Mar 17, 2026

@jijiamoer could you please edit this change to make the two repos submodule in community repo, please?
like how we did for nowledge-mem-gemini-cli ?

Replace the in-tree Windsurf sources with dedicated Antigravity and
Windsurf submodules so the community repo follows the maintainer-requested
integration pattern and points README entries at the standalone repos.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitmodules:
- Around line 4-9: Replace the SSH URLs for the two submodules
"windsurf-trajectory-extractor" and "antigravity-trajectory-extractor" with
their HTTPS equivalents in the .gitmodules entry (change
git@github.com:owner/repo.git to https://github.com/owner/repo.git), then sync
the change so existing clones pick up the new URL (e.g., run git submodule sync
and git submodule update --init --recursive) to avoid SSH auth failures for
contributors/CI.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 86d20eeb-7ef2-4596-8a34-690172c2ecb6

📥 Commits

Reviewing files that changed from the base of the PR and between a0f9cab and 0e4ac02.

📒 Files selected for processing (4)
  • .gitmodules
  • README.md
  • antigravity-trajectory-extractor
  • windsurf-trajectory-extractor
✅ Files skipped from review due to trivial changes (2)
  • antigravity-trajectory-extractor
  • windsurf-trajectory-extractor

Comment thread .gitmodules
@wey-gu
Copy link
Copy Markdown
Member

wey-gu commented Mar 17, 2026

THANKS so much @jijiamoer !!

@wey-gu wey-gu merged commit ec60491 into nowledge-co:main Mar 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants