Polish public docs and demo assets

wimi321 · wimi321 · commit ac478fb2dbfe · 2026-03-18T23:42:45.000+08:00
diff --git a/README.md b/README.md
@@ -7,11 +7,13 @@
 </p>
 
 <p align="center"><strong>Turn AI coding runs into portable, replayable, benchmark-ready task bundles.</strong></p>
-<p align="center">The missing middle layer between raw chat logs and heavyweight benchmark platforms.</p>
+<p align="center">A practical format between raw chat logs and heavyweight benchmark platforms.</p>
 <p align="center">
   <a href="#quickstart"><strong>Quick Start</strong></a> ·
   <a href="#real-bundles"><strong>Real Output</strong></a> ·
+  <a href="#format-vs-alternatives"><strong>Why This Format</strong></a> ·
   <a href="./docs/bundle-format.md"><strong>Bundle Format</strong></a> ·
+  <a href="./docs/sample-benchmark-report.md"><strong>Sample Report</strong></a> ·
   <a href="./ROADMAP.md"><strong>Roadmap</strong></a> ·
   <a href="./docs/branding.md"><strong>Brand Assets</strong></a>
 </p>
@@ -24,13 +26,13 @@ Task Bundle is a TypeScript + Node.js CLI for teams building agents, evals, codi
 
 Package a task once, inspect it later, compare tools on the same starting point, and generate benchmark-style reports from real artifacts.
 
-Why people star it:
-- turn one AI coding run into a clean, shareable directory instead of a screenshot, transcript, or loose patch
-- compare Codex, Claude Code, Cursor, or internal agents with real metadata, hashes, and outcome fields
-- generate benchmark-style reports from a folder of bundles without building a full evaluation platform first
-- keep replay grounded in re-execution and comparison, not token-by-token theater
+It helps you:
+- turn one AI coding run into a clean, shareable directory instead of leaving it scattered across screenshots, transcripts, or loose patches
+- compare Codex, Claude Code, Cursor, or internal agents using metadata, hashes, and outcome fields
+- generate benchmark-style reports from a folder of bundles without standing up a full evaluation platform first
+- preserve enough context for reruns and comparisons without requiring token-perfect recording
 
-If you've ever wanted a format between "a raw chat log" and "a full benchmark platform", this project is that missing middle layer.
+It fits the gap between raw logs and full evaluation systems: light enough for day-to-day work, structured enough for replay and benchmarking.
 
 It is designed for workflows where you want to:
 - inspect what happened
@@ -60,6 +62,8 @@ npm run dev -- compare ./examples/hello-world-bundle ./examples/hello-world-bund
 
 If you want the shortest possible proof that the project already works, this is it.
 
+![Task Bundle workflow overview](./assets/workflow-overview.svg)
+
 <a id="real-bundles"></a>
 
 ## See It On Real Bundles
@@ -106,6 +110,22 @@ Ranking
 2. Fix greeting punctuation | claude-code / claude-sonnet-4 | success | score 0.89
 ```
 
+Browse the committed example report:
+- [docs/sample-benchmark-report.md](./docs/sample-benchmark-report.md)
+- [docs/sample-benchmark-report.zh-CN.md](./docs/sample-benchmark-report.zh-CN.md)
+
+<a id="format-vs-alternatives"></a>
+
+## How It Compares To Common Alternatives
+
+| Need | Chat logs | Zip or tarball | Full benchmark platform | Task Bundle |
+| --- | --- | --- | --- | --- |
+| Share the original task and result together | Partial | Yes | Yes | Yes |
+| Compare different tools on the same starting point | Weak | Manual | Yes | Yes |
+| Carry artifact hashes and outcome metadata | No | No | Yes | Yes |
+| Stay lightweight enough for everyday coding workflows | Yes | Yes | No | Yes |
+| Grow into replay and benchmark workflows later | Weak | Weak | Yes | Yes |
+
 ## Why It Matters
 
 Most AI coding work disappears into screenshots, transcripts, or one-off patches.
@@ -114,7 +134,7 @@ Task Bundle gives you a durable unit you can inspect, archive, compare, validate
 - agent builders who want reproducible tasks
 - eval and benchmark authors who need structured task artifacts
 - teams comparing Codex, Claude Code, Cursor, or custom tools
-- researchers who care about re-execution instead of token-by-token theater
+- researchers who care about re-execution over token-perfect replay
 
 ## What Replay Means Here
 
@@ -323,6 +343,8 @@ They represent the same task captured from different tool/model combinations so
 
 You can also point `taskbundle report` at the same directory to generate a small benchmark-style leaderboard.
 
+For a committed snapshot of that output, see [docs/sample-benchmark-report.md](./docs/sample-benchmark-report.md).
+
 ## Bundle Format At A Glance
 
 - `bundle.json`: top-level metadata and artifact pointers
diff --git a/README.zh-CN.md b/README.zh-CN.md
@@ -7,11 +7,13 @@
 </p>
 
 <p align="center"><strong>把 AI coding 过程变成可分享、可重跑、可比较、可做 benchmark 的任务包。</strong></p>
-<p align="center">它正好补上“聊天记录太散、benchmark 平台太重”之间缺失的那层基础设施。</p>
+<p align="center">它适合放在聊天记录和 benchmark 平台之间，承接真实任务与结果。</p>
 <p align="center">
   <a href="#quickstart"><strong>快速开始</strong></a> ·
   <a href="#real-bundles"><strong>真实输出</strong></a> ·
+  <a href="#format-vs-alternatives"><strong>为什么是这个格式</strong></a> ·
   <a href="./docs/bundle-format.zh-CN.md"><strong>格式说明</strong></a> ·
+  <a href="./docs/sample-benchmark-report.zh-CN.md"><strong>示例报告</strong></a> ·
   <a href="./ROADMAP.zh-CN.md"><strong>路线图</strong></a> ·
   <a href="./docs/branding.zh-CN.md"><strong>品牌素材</strong></a>
 </p>
@@ -22,15 +24,15 @@
 
 Task Bundle 是一个 TypeScript + Node.js CLI，适合 agent、eval、benchmark、可复现实验这类工作流。
 
-一次打包，之后就能 inspect、compare、validate、report，也能把不同工具放到同一起点上做更公平的对照。
+把一次运行整理好之后，就可以 inspect、compare、validate、report，也方便把不同工具放到同一起点上做对照。
 
-大家愿意给它点 star，通常是因为这些点：
-- 它能把一次 AI coding 任务整理成干净、稳定、可搬运的目录，而不是散落的截图、聊天记录或 patch
-- 它能比较 Codex、Claude Code、Cursor 或内部工具的结果，而且比较依据是真实元数据、哈希和 outcome 字段
-- 它能从一组 bundle 直接生成 benchmark 风格报告，不用一开始就搭完整评测平台
-- 它对 replay 的理解更务实，强调“可重跑、可比较”，而不是追求逐 token 复刻的表演感
+它主要解决这些问题：
+- 把一次 AI coding 任务整理成干净、稳定、可搬运的目录，而不是散落在截图、聊天记录或 patch 里
+- 比较 Codex、Claude Code、Cursor 或内部工具的结果，而且比较依据包括元数据、哈希和 outcome 字段
+- 从一组 bundle 直接生成 benchmark 风格报告，不用先搭完整评测平台
+- 为后续重跑和比较保留足够上下文，而不是依赖逐 token 录制
 
-如果你一直觉得“聊天记录太散、benchmark 平台太重”，这个项目就是中间那层缺失的基础设施。
+它适合放在“聊天记录不够稳”和“完整 benchmark 平台太重”之间，作为更轻但足够结构化的方案。
 
 它适合这些场景：
 - 查看一次任务最后到底做了什么
@@ -60,6 +62,8 @@ npm run dev -- compare ./examples/hello-world-bundle ./examples/hello-world-bund
 
 如果你只想先确认“这项目现在到底能不能用”，这组命令就是最短路径。
 
+![Task Bundle workflow overview](./assets/workflow-overview.svg)
+
 <a id="real-bundles"></a>
 
 ## 看看真实输出
@@ -106,6 +110,22 @@ Ranking
 2. Fix greeting punctuation | claude-code / claude-sonnet-4 | success | score 0.89
 ```
 
+你也可以直接点开仓库里提交好的示例报告：
+- [docs/sample-benchmark-report.zh-CN.md](./docs/sample-benchmark-report.zh-CN.md)
+- [docs/sample-benchmark-report.md](./docs/sample-benchmark-report.md)
+
+<a id="format-vs-alternatives"></a>
+
+## 和常见替代方案怎么区分
+
+| 需求 | 聊天记录 | Zip / tarball | 完整 benchmark 平台 | Task Bundle |
+| --- | --- | --- | --- | --- |
+| 把原始任务和最终结果放在一起分享 | 部分满足 | 可以 | 可以 | 可以 |
+| 在同一起点上比较不同工具 | 很弱 | 很靠手工 | 可以 | 可以 |
+| 携带 artifact 哈希和结果元数据 | 不行 | 不行 | 可以 | 可以 |
+| 足够轻，能融入日常 coding 工作流 | 可以 | 可以 | 不太行 | 可以 |
+| 之后继续长成 replay / benchmark 工作流 | 很弱 | 很弱 | 可以 | 可以 |
+
 ## 为什么值得关注
 
 很多 AI coding 结果最后只留下截图、聊天记录或者一个 patch，后续几乎没法稳定比较。
@@ -114,7 +134,7 @@ Task Bundle 想解决的就是这个空档：把一次任务变成一个可以 i
 - 想做可复现实验的 agent 作者
 - 想做任务评测和 benchmark 的团队
 - 想比较 Codex、Claude Code、Cursor 或内部工具的开发者
-- 不想被“逐 token 回放”误导、而是真正在意可重跑的人
+- 更关心可重跑，而不是逐 token 回放一致性的人
 
 ## 这里的 Replay 是什么意思
 
@@ -323,6 +343,8 @@ npm run dev -- report ./examples --out ./dist/benchmark-report.md
 
 你也可以直接把这个目录交给 `taskbundle report`，生成一份小型 benchmark 排行榜。
 
+如果你想直接看一份已提交到仓库里的报告快照，可以打开 [docs/sample-benchmark-report.zh-CN.md](./docs/sample-benchmark-report.zh-CN.md)。
+
 ## Bundle 格式一眼看懂
 
 - `bundle.json`：顶层元数据和 artifact 指针
diff --git a/assets/workflow-overview.svg b/assets/workflow-overview.svg
@@ -0,0 +1,82 @@
+<svg width="1600" height="460" viewBox="0 0 1600 460" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
+  <title id="title">Task Bundle workflow overview</title>
+  <desc id="desc">A workflow graphic showing how Task Bundle captures, inspects, compares, and reports AI coding runs.</desc>
+  <defs>
+    <linearGradient id="bg" x1="92" y1="50" x2="1520" y2="418" gradientUnits="userSpaceOnUse">
+      <stop stop-color="#10212E"/>
+      <stop offset="0.5" stop-color="#17364C"/>
+      <stop offset="1" stop-color="#21485C"/>
+    </linearGradient>
+    <linearGradient id="accent" x1="375" y1="164" x2="1217" y2="343" gradientUnits="userSpaceOnUse">
+      <stop stop-color="#7BE0D4"/>
+      <stop offset="1" stop-color="#FFD18B"/>
+    </linearGradient>
+    <linearGradient id="card" x1="146" y1="112" x2="380" y2="363" gradientUnits="userSpaceOnUse">
+      <stop stop-color="#FFF6EA"/>
+      <stop offset="1" stop-color="#F1E0CC"/>
+    </linearGradient>
+    <filter id="shadow" x="95" y="84" width="1410" height="308" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB">
+      <feFlood flood-opacity="0" result="BackgroundImageFix"/>
+      <feColorMatrix in="SourceAlpha" type="matrix" values="0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 0" result="hardAlpha"/>
+      <feOffset dy="16"/>
+      <feGaussianBlur stdDeviation="18"/>
+      <feColorMatrix type="matrix" values="0 0 0 0 0.027451 0 0 0 0 0.109804 0 0 0 0 0.164706 0 0 0 0.18 0"/>
+      <feBlend mode="normal" in2="BackgroundImageFix" result="effect1_dropShadow_0_1"/>
+      <feBlend mode="normal" in="SourceGraphic" in2="effect1_dropShadow_0_1" result="shape"/>
+    </filter>
+  </defs>
+
+  <rect width="1600" height="460" rx="32" fill="url(#bg)"/>
+  <path d="M0 347C136 309 268 288 400 300C553 314 642 382 790 384C972 388 1062 295 1223 276C1360 260 1464 290 1600 337V460H0V347Z" fill="#0E1D27" fill-opacity="0.46"/>
+  <path d="M145 90H1455" stroke="url(#accent)" stroke-opacity="0.3" stroke-width="3" stroke-linecap="round"/>
+  <text x="148" y="62" fill="#FFF4E6" font-size="30" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">How Task Bundle turns one run into something reusable</text>
+
+  <g filter="url(#shadow)">
+    <g transform="translate(128 108)">
+      <rect x="0" y="0" width="264" height="216" rx="24" fill="url(#card)"/>
+      <text x="26" y="46" fill="#17354A" font-size="18" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">1. Capture the task</text>
+      <text x="26" y="78" fill="#5D7380" font-size="16" font-weight="600" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">Package the run into standard files</text>
+      <rect x="26" y="108" width="212" height="76" rx="18" fill="#FFFFFF" fill-opacity="0.84" stroke="#DCC7AF"/>
+      <text x="44" y="137" fill="#1E4257" font-size="17" font-weight="800" font-family="'SF Mono',Menlo,Monaco,monospace">task.md</text>
+      <text x="44" y="161" fill="#5A7688" font-size="15" font-weight="600" font-family="'SF Mono',Menlo,Monaco,monospace">summary.md</text>
+      <text x="44" y="185" fill="#5A7688" font-size="15" font-weight="600" font-family="'SF Mono',Menlo,Monaco,monospace">result.diff</text>
+    </g>
+
+    <g transform="translate(466 108)">
+      <rect x="0" y="0" width="264" height="216" rx="24" fill="#17364B"/>
+      <text x="26" y="46" fill="#FFF6EA" font-size="18" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">2. Inspect the run</text>
+      <text x="26" y="78" fill="#A7C9D1" font-size="16" font-weight="600" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">See what happened without replay theater</text>
+      <rect x="26" y="108" width="212" height="76" rx="18" fill="#FFF6EA" fill-opacity="0.1" stroke="#88D8CC" stroke-opacity="0.28"/>
+      <text x="44" y="137" fill="#FFFFFF" font-size="16" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">status: success</text>
+      <text x="44" y="161" fill="#AEEFE5" font-size="15" font-weight="700" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">events: 3</text>
+      <text x="44" y="185" fill="#FFD79A" font-size="15" font-weight="700" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">workspace files: 1</text>
+    </g>
+
+    <g transform="translate(804 108)">
+      <rect x="0" y="0" width="264" height="216" rx="24" fill="url(#card)"/>
+      <text x="26" y="46" fill="#17354A" font-size="18" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">3. Compare tools</text>
+      <text x="26" y="78" fill="#5D7380" font-size="16" font-weight="600" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">Keep metadata and scores side by side</text>
+      <rect x="26" y="108" width="212" height="76" rx="18" fill="#FFFFFF" fill-opacity="0.84" stroke="#DCC7AF"/>
+      <text x="44" y="137" fill="#1E4257" font-size="16" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">codex / gpt-5    0.93</text>
+      <text x="44" y="161" fill="#5A7688" font-size="15" font-weight="700" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">claude-code      0.89</text>
+      <text x="44" y="185" fill="#CF6F3B" font-size="15" font-weight="700" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">delta            +0.04</text>
+    </g>
+
+    <g transform="translate(1142 108)">
+      <rect x="0" y="0" width="264" height="216" rx="24" fill="#17364B"/>
+      <text x="26" y="46" fill="#FFF6EA" font-size="18" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">4. Report the results</text>
+      <text x="26" y="78" fill="#A7C9D1" font-size="16" font-weight="600" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">Generate a small benchmark story from runs</text>
+      <rect x="26" y="108" width="212" height="76" rx="18" fill="#FFF6EA" fill-opacity="0.1" stroke="#FFD18B" stroke-opacity="0.34"/>
+      <text x="44" y="137" fill="#FFF6EA" font-size="16" font-weight="800" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">Bundles: 2</text>
+      <text x="44" y="161" fill="#AEEFE5" font-size="15" font-weight="700" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">Average score: 0.91</text>
+      <text x="44" y="185" fill="#FFD79A" font-size="15" font-weight="700" font-family="'Avenir Next','Helvetica Neue',Arial,sans-serif">Leaderboard ready</text>
+    </g>
+  </g>
+
+  <path d="M394 216H453" stroke="#A9E5DB" stroke-width="4" stroke-linecap="round"/>
+  <path d="M448 208L465 216L448 224" stroke="#A9E5DB" stroke-width="4" stroke-linecap="round" stroke-linejoin="round"/>
+  <path d="M732 216H791" stroke="#A9E5DB" stroke-width="4" stroke-linecap="round"/>
+  <path d="M786 208L803 216L786 224" stroke="#A9E5DB" stroke-width="4" stroke-linecap="round" stroke-linejoin="round"/>
+  <path d="M1070 216H1129" stroke="#A9E5DB" stroke-width="4" stroke-linecap="round"/>
+  <path d="M1124 208L1141 216L1124 224" stroke="#A9E5DB" stroke-width="4" stroke-linecap="round" stroke-linejoin="round"/>
+</svg>
diff --git a/docs/branding.md b/docs/branding.md
@@ -8,6 +8,8 @@ The art direction is intentionally warm-editorial rather than generic SaaS gradi
 
 - `assets/hero-banner.svg`
   Embedded at the top of the README to make the repository landing page feel like a product, not just a package listing.
+- `assets/workflow-overview.svg`
+  A second README visual that explains the capture -> inspect -> compare -> report loop in one glance.
 - `assets/social-preview.svg`
   Source artwork for GitHub social preview uploads.
 - `assets/social-preview.png`
diff --git a/docs/branding.zh-CN.md b/docs/branding.zh-CN.md
@@ -6,6 +6,8 @@ Task Bundle 在 `assets/` 目录下提供了一套可直接用于仓库展示的
 
 - `assets/hero-banner.svg`
   中英文 README 顶部使用的主视觉横幅，可继续编辑。
+- `assets/workflow-overview.svg`
+  README 里的第二张主视觉，用来一眼解释 capture -> inspect -> compare -> report 这条路径。
 - `assets/social-preview.svg`
   GitHub 社交预览图的可编辑源文件。
 - `assets/social-preview.png`
diff --git a/docs/sample-benchmark-report.md b/docs/sample-benchmark-report.md
@@ -0,0 +1,35 @@
+# Sample Benchmark Report
+
+This page shows what `taskbundle report` looks like against the example bundles included in this repository.
+
+## Regenerate Locally
+
+```bash
+npm run dev -- report ./examples --out ./dist/benchmark-report.md
+```
+
+## Snapshot
+
+- Bundles: 2
+- Scored bundles: 2
+- Average score: 0.91
+
+## Ranking
+
+| Rank | Title | Tool | Model | Status | Score | Events | Workspace |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| 1 | Fix greeting punctuation | codex | gpt-5 | success | 0.93 | 3 | 1 |
+| 2 | Fix greeting punctuation | claude-code | claude-sonnet-4 | success | 0.89 | 4 | 1 |
+
+## Leaderboard By Tool/Model
+
+| Tool | Model | Runs | Scored | Successes | Avg Score | Best Score |
+| --- | --- | --- | --- | --- | --- | --- |
+| codex | gpt-5 | 1 | 1 | 1 | 0.93 | 0.93 |
+| claude-code | claude-sonnet-4 | 1 | 1 | 1 | 0.89 | 0.89 |
+
+## Why This Matters
+
+- It gives the repo a benchmark-shaped artifact without forcing a full benchmark platform.
+- It shows that the example bundles are not toy files with no downstream use.
+- It makes cross-tool comparisons legible for humans before you build dashboards.
diff --git a/docs/sample-benchmark-report.zh-CN.md b/docs/sample-benchmark-report.zh-CN.md
@@ -0,0 +1,35 @@
+# 示例 Benchmark 报告
+
+这个页面展示的是：把仓库自带的 example bundles 交给 `taskbundle report` 之后，大概会得到什么样的结果。
+
+## 本地重新生成
+
+```bash
+npm run dev -- report ./examples --out ./dist/benchmark-report.md
+```
+
+## 示例快照
+
+- Bundles: 2
+- Scored bundles: 2
+- Average score: 0.91
+
+## 排名
+
+| Rank | Title | Tool | Model | Status | Score | Events | Workspace |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| 1 | Fix greeting punctuation | codex | gpt-5 | success | 0.93 | 3 | 1 |
+| 2 | Fix greeting punctuation | claude-code | claude-sonnet-4 | success | 0.89 | 4 | 1 |
+
+## Tool / Model 排行
+
+| Tool | Model | Runs | Scored | Successes | Avg Score | Best Score |
+| --- | --- | --- | --- | --- | --- | --- |
+| codex | gpt-5 | 1 | 1 | 1 | 0.93 | 0.93 |
+| claude-code | claude-sonnet-4 | 1 | 1 | 1 | 0.89 | 0.89 |
+
+## 为什么这个页面有价值
+
+- 它让仓库直接具备一个 benchmark 风格的可见成果，不需要先做完整平台。
+- 它说明 example bundles 不是摆设，而是真的可以继续拿来分析和比较。
+- 它让“跨工具比较”在没有 dashboard 之前，也已经足够清楚可读。
diff --git a/package.json b/package.json
@@ -33,6 +33,7 @@
     "docs",
     "assets/hero-banner.svg",
     "assets/social-preview.svg",
+    "assets/workflow-overview.svg",
     "templates",
     "LICENSE",
     "README.md",