fix(reviewer): surface author rationale to reviewer agents by AbirAbbas · Pull Request #25 · Agent-Field/pr-af

AbirAbbas · 2026-05-07T21:41:36Z

Summary

Reviewer agents were producing findings that contradicted explicit design rationale stated in PR descriptions, because the rationale never reached them. Three structural causes:

Intake/anatomy truncated the description aggressively. Intake gate kept the first 500 chars, intake fallback 1000, anatomy 500. A 1100-char PR body with the rationale in the second half was literally cut off before any model saw it.
The reviewer agent never saw the raw description. It received pr_narrative, risk_surfaces, and intake_summary — all digested summaries. The author's voice was laundered through two summarization layers before the reviewer looked at code.
The anatomy prompt explicitly tells the model to discount the description ("what the CODE says, not what the PR description says"). That framing is correct for diff-vs-description divergence detection, but it leaves the reviewer with no channel for author intent.

Changes

intake_phase (gate + fallback) and anatomy_phase: bump description truncation from 500/1000/500 → 4000 chars uniformly. Catches typical thoughtful PR bodies without unbounded prompt growth.
review_dimension: new pr_description parameter, plus a dedicated "Author's Stated Intent" section near the top of the assembled prompt. The framing tells the reviewer not to defer to the description — but if a finding contradicts a design choice the author explicitly justified, the reviewer must rebut the stated rationale on its merits, not flag as if the rationale wasn't given. Findings on points the description is silent about are unaffected.
ReviewOrchestrator._run_parallel_review: wire self.pr_data.description through to review_dimension.

Background

Investigated against a real review where the reviewer flagged "add idempotency check before POST" and "silent failure on 403" — both contradicting explicitly-stated rationale ("POST is additive on purpose", "fail-soft is intentional") sitting in the PR body the reviewer never saw. The same root cause produces a class of bad findings on any PR whose author bothers to explain why.

Test plan

Existing tests pass — 25 pass, 1 pre-existing unrelated cost_tracker failure on main baseline (not introduced by this PR)
Imports clean (from pr_af.orchestrator import ReviewOrchestrator etc.)
Manual: re-run a review against a PR with substantive design rationale in the body and confirm findings either get dropped or get rebutted on the merits when they contradict that rationale

🤖 Generated with Claude Code

- Add recursive sub-review spawning with depth control (max_review_depth param) - Fix GitHub comment posting: path normalization, diff line range filtering - Add REQUEST_CHANGES → COMMENT fallback for own-PR token case - Fix CP async execute: wrap params under 'input' key - Add SubReviewRequest schema and _SubReviewRequest harness schema - Wire max_review_depth through ReviewInput → BudgetConfig → orchestrator - All 8 reasoners tracked in CP with VC generation (20 VCs per run) - Successfully posted 4 inline comments to pysktb PR #10

- Rewrite _format_comment_body: multiline evidence as blockquotes, suggestion mode 'comment' renders as plain text (not confusing commented-out code), simplified metadata line - Rewrite _format_summary: overall rating/grade at top, AgentField branding at top, PR overview in collapsible section, key findings with blocking/non-blocking split, all findings by severity (collapsible), review process details (collapsible), pipeline stats (collapsible), normal-sized footer - Add _compute_rating, _build_key_findings, _build_review_details, _build_pipeline_stats helpers - Lower min_severity default to 'nitpick', reduce confidence thresholds, raise max_duration_seconds to 1800s - Add severity calibration guidance to reviewer harness prompt - Add suggestion_mode toggle ('comment' default vs 'code') - Fix PR branch checkout for new files in diff - Docker: shared workspace volume, XDG_DATA_HOME persistence

…LLM-as-judge evaluation - Rewrite all 7 harness prompts for architecture-level code review depth (anatomy, planning, review_dimension, cross_ref, adversary, intake, coverage) - Fix repo clone for large repos: shallow clone (--depth 1 --no-tags --no-checkout), 600s timeout, shallow PR ref fetch with 300s timeout - Improve coverage gap prompt in orchestrator - Add benchmark/truenas-middleware-18291/ with full dry run results: pr-af-result.json (25 findings, 8 critical), claude-code data (4 findings), EVALUATION.md comparing PR-AF vs Claude Code as LLM-as-a-judge

…parallel adversary batching Replace single planning phase with 3 parallel meta-selectors (Semantic, Mechanical, Systemic) that generate review dimensions from independent analytical lenses. Add 3-level false-positive reduction: Level 1 in review_dimension prompt (reachability/evidence/confidence gates), Level 2 via cross-meta deduplication, Level 3 via parallel adversary batching. - Add MetaDimensionResult + MetaSelectorConfig schemas - Add meta_semantic, meta_mechanical, meta_systemic reasoners - Refactor orchestrator: _run_meta_selectors, _dedup_cross_meta, _run_parallel_adversary, reorder adversary before cross_ref - Update phase budgets (meta_selectors: 0.30, adversary: 0.40)

…LM-as-judge evaluation Rewrites EVALUATION.md as a model comparison on TrueNAS middleware PR #18291. Sonnet scores highest (0.828) with near-perfect precision; Kimi wins breadth (0.727). Adds separate result files for each model run.

… benchmark - Enrich all pipeline phases with full context (diff patches, PR narrative, risk surfaces, blast radius, intake summary, dimension dedup awareness) - Add _write_context_file() for large context → file fallback (.harness reads) - Add _build_file_patches(), _escalate_depth(), _cleanup_context_dir() to orchestrator - Upgrade meta-selector prompts: Investigation Protocol instructs harness to explore repo (browse files, trace callers, check dependencies) not just analyze pre-digested JSON blobs - Add Quality Gate: blocks dimensions generated solely from diff text - Upgrade review_dimension: verify claims against actual code before reporting - Upgrade adversary: must read actual files to verify reviewer claims - Fix Python 3.11 compat: replace nested f-strings with backslashes - Add adaptive depth escalation (quick→standard→deep) based on risk signals - Kimi k2.5 enriched run on PR #254: 25 findings, 8 critical, 16 adversary confirmed, 1994s duration

…ification harness - Add evidence.py: parallel programmatic code extraction (zero-cost context feed) - EvidencePackage model with primary_code, caller_snippets, cross_ref_snippets - Async extraction with semaphore(10), Python 3.11 compatible - Add evidence_verifier harness: independently investigates critical/important findings - Meta-prompted to browse repo, verify claims against actual code - Returns verified/falsified status with revised severity/confidence - Update adversary_phase: receives ground truth evidence + verification notes - Compares reviewer claims vs extracted code before challenging - Update cross_ref_phase: receives import context and caller snippets per finding - Update orchestrator: wire extraction → verification → adversary → cross_ref - Falsified findings demoted before adversary phase - Coverage loop also extracts evidence for gap findings

- 25 findings, 0% FP rate (down from 10% in Sonnet baseline) - 4 findings falsified by verification harness before adversary phase - 17 true positives, 5 valid suggestions, 2 duplicates - errors.Is finding confirmed as TRUE POSITIVE (line 5186 uses string comparison) - ~43min runtime (+7% overhead vs enriched-only run)

- Implemented Parallel Compound Analysis to discover novel attack chains - Replaced old logic entirely - Added a dedup phase for the newly discovered compound findings - Adjusted Docker health checks to prevent timeouts under parallel load - Added EVALUATION.md documenting PR-AF improvements vs Claude Code baseline

- Added viral/developer-facing formatting and badges - Highlighted the single-agent ceiling vs multi-agent DAG approach - Documented the 8-phase architecture - Showcased the compound analysis differentiator using the PR 254 benchmark

- Removed mechanical 8-phase pipeline table - Emphasized 0% false positive rate via Evidence Grounding - Highlighted Compound Attack Chain synthesis - Added focus on Adversarial Tension and Dynamic Meta-prompts - Strengthened viral/powerful messaging against single-agent baselines

- Added ci_runner.py to handle async execution polling - Updated README with drop-in GitHub Actions workflow template - Configured to use native GITHUB_TOKEN for zero-config deployment - Set workflow trigger to act on 'pr-af' label

- Appended 'Reviewed by AgentField PR-AF' to all inline comments - Added stylish AgentField shield badge to main PR summary - Pointed internal links to the new Agent-Field/pr-af repo

- Removed overly verbose architectural terminology and DAG references - Added Ecosystem Comparison table featuring Claude Code and Commercial SaaS (Codex/CodeRabbit) - Made the tone more objective, pointed, and respectful of other tools in the space - Focused explicitly on the core value props: Evidence Grounding, Compound Analysis, Adversarial Red Teaming

…chitecture - Removed references to 'Red Team agent' and simple 'agents' to avoid sounding trivial - Replaced with 'Falsifiability Gates' and 'auto-invalidation process' - Enhanced 'Compound Vulnerability Synthesis' to focus on cross-correlated risks and systemic coalescence - Framed the architecture as a 'massively parallel cognitive pipeline'

- Added a 'Dynamic Pipeline Architecture' section detailing how PR-AF morphs its execution graph based on PR topology - Authored a complex, highly technical Mermaid diagram showing the flow from Intake -> Lenses -> Verification -> Compound Synthesis - Elevated language to emphasize large-scale cognitive orchestration rather than static scripts

- Add /webhook/github endpoint for @mention-triggered PR reviews - Add GitHub App authentication (JWT + installation tokens) to GitHubClient - Reviews post as pr-af[bot] when GitHub App is configured, falls back to PAT - Add railway.toml build config with healthcheck - Add PyJWT[crypto] dependency for GitHub App JWT signing - Configure Railway env vars with dynamic service references - Remove .env from tracking (contains sensitive keys) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CI was failing because `ruff check .` picked up 1375 errors from the bundled .docker-sdk/agentfield/ directory which is third-party code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: Railway deployment + GitHub App webhook

- Add diptych hero image comparing traditional single-LLM review vs PR-AF's dynamic plan→review→challenge pipeline - Add vertical funnel architecture diagram showing 7-phase adaptive pipeline with noise reduction metrics - Update README header to "Open-Source Agentic PR Reviewer" - Wrap mermaid diagram in collapsible details block - Add architecture image inline with link to full docs

…field.ai - Remove .ai()/.harness()/meta-prompting from architecture footer - Replace with: dynamic planning · parallel review · adversarial verification · streaming pipeline - Fix footer domain to agentfield.ai on both images

…nore local SDK dirs

Replace /reasoner/pr-af/review with /api/v1/execute/async/pr-af.review across all DX examples. Wrap JSON bodies in {"input": {...}} envelope to match the canonical AgentField control plane API.

Standardize DX pattern — show minimal curl command and example review finding with evidence grounding near top of README.

- Standardize Quick Start to 3-step pattern (clone, env, docker compose) - Delete Dockerfile.local, docker-compose.local.yml, docker-compose.hub.yml - Delete vendored .docker-sdk/ directory (use PyPI agentfield instead) - Update Dockerfile to install agentfield from PyPI

Railpack requires main.py or app.py at the project root to detect the start command. Adds a thin wrapper that sets PYTHONPATH for the src layout and delegates to pr_af.app:main.

fix: provider-aware model routing for claude-code harness

…ider intake_gate and coverage_gate use .ai() (OpenRouter), not .harness(), so they need OpenRouter model IDs even when provider is claude-code. Split tier resolution so .ai() fields always use OpenRouter defaults while .harness() fields use the provider-aware map. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: resolve .ai() fields with OpenRouter tier map

The provider_env() passthrough list was missing CLAUDE_CODE_OAUTH_TOKEN, so the Claude Code SDK subprocess never received the OAuth credentials. Combined with ANTHROPIC_API_KEY being set to "UNUSED" on Railway, this caused every .harness() call to silently fail authentication. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Runs a quick version check and auth test when provider is claude-code so we can see the actual error in Railway logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add hasCompletedOnboarding to Dockerfile for headless Claude CLI - Filter out placeholder values (UNUSED) from provider env passthrough so ANTHROPIC_API_KEY=UNUSED doesn't override the valid OAuth token Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Claude CLI checks ANTHROPIC_API_KEY before CLAUDE_CODE_OAUTH_TOKEN. If ANTHROPIC_API_KEY is set to a placeholder value like "UNUSED", the CLI rejects it and never falls through to the OAuth token. Remove invalid keys from os.environ at startup so the OAuth flow works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Allow callers to pass `provider` ("opencode", "claude-code", etc.) on each review request instead of requiring a redeploy to switch harnesses. The env var PR_AF_PROVIDER remains the default when no provider is specified in the request. Threads the provider from the API surface through ReviewConfig, orchestrator, and into every .harness() call where agentfield's per-call provider override takes effect. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rride feat: per-request provider override for harness selection

Railway deployment investigation revealed 7 GB RSS and 2.7 GB disk waste from accumulated Claude Code session data, V8 JIT .so files, and Python heap fragmentation after reviews. Memory fixes: - Set MALLOC_TRIM_THRESHOLD_=0 so glibc returns freed pages to OS - Call malloc_trim(0) via ctypes after each review completes - Clear evidence_map and verification_map after use in review layer - Only extract evidence for new gap findings in coverage loop (was re-extracting for ALL accumulated findings each iteration) - Explicit del of gap_evidence between coverage iterations Disk fixes: - Snapshot Claude session dirs before review, delete only new ones after (safe for concurrent reviews — each cleans only its own sessions) - Clean empty pyright-* temp dirs from /tmp after reviews - Remove orphaned V8 .so files older than 60s from /tmp Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: reduce memory and disk usage from harness subprocess artifacts

The review reasoner had hard-coded 300s wall-clock and $2 cost caps that fire as `BudgetExhaustedError("Budget exhausted before meta-selectors")` on real-world PRs (the docs say a 500-line PR takes 35–50min — 300s never had a chance). Callers like github-buddy intentionally don't thread budget overrides through `app.call`, so deployments had no way to retune without a code change. Read the defaults from env (PR_AF_NO_BUDGET, PR_AF_MAX_DURATION_SECONDS, PR_AF_MAX_COST_USD) at module load. Hard-coded fallbacks preserve the prior behavior for callers that don't set the env. Setting PR_AF_NO_BUDGET=true on a deployment disables enforcement entirely (the existing `no_budget` short-circuit in BudgetConfig). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(budget): env-driven defaults for review reasoner caps

The `_default_tier_map("opencode")` function previously hardcoded the budget tier to gemini-2.5-flash, ignoring PR_AF_MODEL even though the mid and premium tiers honored it. This caused the intake_gate, coverage_gate, and dedup_gate to silently run on gemini when callers expected their PR_AF_MODEL setting to apply everywhere. Budget now falls back to ai_model like the other tiers; PR_AF_MODEL_BUDGET remains available for callers who want to keep gates on a cheaper model. Also updates the stale fallback defaults (minimax-m2.5 / gemini-2.5-flash) to openrouter/moonshotai/kimi-k2.5, matching the docker-compose env override and what deployments actually use. .env.example documents the new defaults and notes that PR_AF_MODEL covers all tiers unless individually overridden. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fix(config): respect PR_AF_MODEL across all tiers; default to kimi-k2.5

…dimension fan-out Production data on a recent PR showed 8 review_dimensions running ~25 min each but throttled to a concurrency=3 semaphore — wall-clock time was ~3× per-dimension cost (≥75 min for the review phase alone) even though we were paying nothing during the wait. This was the single biggest reason `review` runs were hitting the 7200s caller timeout in github-buddy. Two changes: 1. Default raised 3 → 10. With 6–8 dimensions per typical PR, all of them can now run in parallel and the phase is bounded by the slowest single dimension instead of the semaphore. 2. Surfaced the value as `PR_AF_MAX_CONCURRENT_REVIEWERS` so a deployment can dial it back if its provider hits rate limits. The stagger_delay_seconds (2s) is unchanged and remains the first line of defense against burst-rate-limit hits. The previous comment on this field cited "OpenRouter or other rate-limited providers" — at 10 concurrent on Kimi K2.5 we're well within OpenRouter's per-key limits, and any provider that can't sustain that for a single review session would need configuration anyway. Tests: test_budget_config_defaults updated to pin the new default and to clear PR_AF_MAX_CONCURRENT_REVIEWERS so the assertion reflects the in-code default rather than the runtime env override. Full suite green except for one pre-existing test_cost_tracker failure on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…iewers perf(review): bump max_concurrent_reviewers 3 → 10 to unblock review_dimension fan-out

…the prompt The meta_semantic / meta_mechanical / meta_systemic phases were burning 15–40 minutes apiece in production, even though Kimi K2.6 is fast per-turn. Profiling traced this back to the prompts themselves: each phase had an "Investigation Protocol" section that *mandated* browsing the actual source files, and a "Quality Gate" that *rejected* diff-only answers ("If your rationale says 'visible in the diff' or 'based on the patches', you have not investigated enough"). That language turned every phase into a 20–40 turn opencode session of file reads, when the diff patches plus the intake/anatomy summary already contain almost everything these phases need to produce dimensions. The model wasn't slow — it was being instructed to do expensive exploration before it was allowed to answer. Replaced the mandate with neutral framing: - "Working with Context": diff + intake/anatomy is usually enough. Repo access is available; use it sparingly, only when the diff genuinely doesn't show what you need. - "Quality Bar": dimensions must be specific (named files, line ranges). The diff already carries those; a dimension grounded in the diff is good. Vague dimensions are still rejected. Each lens keeps its specific bullets (semantic = logic/contracts, mechanical = signatures/imports, systemic = patterns/coverage) and its target_files requirement. The output schema is unchanged. What changes is the model is no longer penalized for producing a correct answer from the diff in 2 turns instead of 30. Also bumped the in-code default model openrouter/moonshotai/kimi-k2.5 → k2.6 to match what's actually deployed in production. Local dev was silently running on an older model than prod. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…hen-to-browse perf(meta-phases): let the model decide when to browse, not the prompt

Picks up the cooperative cancellation feature in the AgentField SDK (per-execution cancel callback at /_internal/executions/<id>/cancel, AbortController/asyncio-task wiring in the worker runtime). With this floor, a control-plane cancel-tree on a stuck pr-af run will surface as CancelledError inside the reasoner phase that's currently running, short-circuiting the in-flight LLM call rather than letting it run to completion and discarding the output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(deps): bump agentfield floor to >=0.1.77

Adds the two-env-var knob (OPENCODE_ENABLE_EXA=1 + EXA_API_KEY) to .env.example and a brief subsection in the Quick Start of README. When both vars are set, opencode's built-in websearch / webfetch tools become available to every review reasoner running through the open runtime — model decides per-task whether the lookup is worth doing (verify API contracts, check CVE/deprecation status, confirm library version behavior). No PR-AF wiring is required beyond setting the vars on the deployment; opencode inherits parent env from agentfield's run_cli, and tool awareness is communicated to the model via the standard tool-definition layer (verified: model invokes websearch autonomously when the task warrants it). No prompt-level guardrails added. Review reasoners are short / single- shot / analytic — no loop-risk where unrestrained search could rabbit-hole. SWE-AF's run_coder is the exception (long iterative coding loops); that gets a restraint snippet in its own PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: document opencode web-search opt-in for review reasoners

The baked-in opencode.json hardcoded both model and small_model to a literal kimi-k2.5 string, so OpenCode's small_model path (used for session titles and prompt compaction) silently bypassed whatever HARNESS_MODEL the deployer set on Railway. Per-call -m on `opencode run` already pins the main model from HARNESS_MODEL via the Python harness provider; this aligns the small_model path the same way. OpenCode supports {env:VAR} interpolation in string fields. Switch model and small_model to {env:HARNESS_MODEL} so they track the env var the rest of the pr-af stack already reads. The Dockerfile already sets HARNESS_MODEL=openrouter/moonshotai/kimi-k2.5 as the image default, so fresh containers without overrides still have a value to interpolate. Also drops the now-redundant `models` registration block — OpenCode auto-handles unknown model names from a configured provider, and pinning a specific name there broke the moment HARNESS_MODEL changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…model fix(opencode): respect HARNESS_MODEL in baked-in opencode.json (small_model path)

Reviewers produced findings that contradicted explicit design rationale from the PR description because the rationale never reached them. - Intake gate / intake fallback / anatomy were truncating the description to 500 / 1000 / 500 chars. Thoughtful PR bodies with rationale past those cutoffs were silently chopped before any model saw them. - review_dimension only received digested summaries (pr_narrative, risk_surfaces, intake_summary) — never the raw description, so the author's voice was laundered through two summarization layers before the reviewer looked at code. - The anatomy prompt also tells the model to discount the description ("what the CODE says, not what the PR description says"), which is correct for divergence detection but leaves the reviewer with no channel for author intent. Bump description truncation in intake/anatomy to 4000 chars uniformly, add a pr_description param to review_dimension with a dedicated "Author's Stated Intent" section near the top of the prompt, and wire self.pr_data.description through _run_parallel_review. The new section is explicitly NOT "trust the author" — it tells the reviewer to verify the code regardless, but to rebut the author's stated reasoning on its merits when a finding contradicts an explicitly-justified design choice, rather than flagging as if the rationale wasn't given. Findings on points the description is silent about are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

santoshkumarradha and others added 30 commits March 10, 2026 08:21

fix: Dockerfile sdk path, add python-dotenv/fastapi deps, .env, README

19f9e6b

ci: add GitHub Actions for linting and Docker build

08618fb

feat: add github actions ci/cd integration

5aa69d3

- Added ci_runner.py to handle async execution polling - Updated README with drop-in GitHub Actions workflow template - Configured to use native GITHUB_TOKEN for zero-config deployment - Set workflow trigger to act on 'pr-af' label

feat: add branded footers to GitHub comments

2207b6a

- Appended 'Reviewed by AgentField PR-AF' to all inline comments - Added stylish AgentField shield badge to main PR summary - Pointed internal links to the new Agent-Field/pr-af repo

lint fixes

0995af6

fix: scope ruff lint to src/ and scripts/ only

62b5010

CI was failing because `ruff check .` picked up 1375 errors from the bundled .docker-sdk/agentfield/ directory which is third-party code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge pull request #1 from Agent-Field/feat/railway-deployment

8d410a6

feat: Railway deployment + GitHub App webhook

fix: use PyPI agentfield in Dockerfile.local, add missing deps, gitig…

3a43c89

…nore local SDK dirs

chore: remove tracked __pycache__/ files (already in .gitignore)

55f157c

fix: standardize API endpoints to control plane pattern

d8ab02c

Replace /reasoner/pr-af/review with /api/v1/execute/async/pr-af.review across all DX examples. Wrap JSON bodies in {"input": {...}} envelope to match the canonical AgentField control plane API.

docs: add One-Call DX section with curl and example finding

3882c7f

Standardize DX pattern — show minimal curl command and example review finding with evidence grounding near top of README.

chore: add root main.py for Railpack auto-detection

308365b

Railpack requires main.py or app.py at the project root to detect the start command. Adds a thin wrapper that sets PYTHONPATH for the src layout and delegates to pr_af.app:main.

AbirAbbas and others added 29 commits March 13, 2026 18:44

Merge pull request #13 from Agent-Field/fix/claude-code-railway-runtime

3376a23

fix: provider-aware model routing for claude-code harness

Merge pull request #14 from Agent-Field/fix/ai-field-resolution

9061c87

fix: resolve .ai() fields with OpenRouter tier map

uniq dir for prs

60b9404

debug: add Claude CLI diagnostic on startup

fa56d27

Runs a quick version check and auth test when provider is claude-code so we can see the actual error in Railway logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

debug: capture stdout + key prefix in Claude CLI diagnostic

52ec0c6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: wrap long intake_phase signature to satisfy line length lint

274d3b5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge pull request #15 from Agent-Field/feat/per-request-provider-ove…

fcfadd4

…rride feat: per-request provider override for harness selection

Merge pull request #16 from Agent-Field/fix/memory-leak-cleanup

4658417

fix: reduce memory and disk usage from harness subprocess artifacts

Merge pull request #17 from Agent-Field/feat/env-driven-budget-defaults

afd1f37

feat(budget): env-driven defaults for review reasoner caps

Merge pull request #18 from Agent-Field/fix/respect-user-model-config

743f078

fix(config): respect PR_AF_MODEL across all tiers; default to kimi-k2.5

Merge pull request #19 from Agent-Field/perf/raise-max-concurrent-rev…

50f14ea

…iewers perf(review): bump max_concurrent_reviewers 3 → 10 to unblock review_dimension fan-out

Merge pull request #20 from Agent-Field/perf/let-meta-phases-decide-w…

07969c2

…hen-to-browse perf(meta-phases): let the model decide when to browse, not the prompt

Merge pull request #21 from Agent-Field/chore/bump-agentfield-0.1.77

d8b8d4b

chore(deps): bump agentfield floor to >=0.1.77

Merge pull request #22 from Agent-Field/docs/web-search-via-opencode-exa

abe6790

docs: document opencode web-search opt-in for review reasoners

Merge pull request #24 from Agent-Field/fix/opencode-respect-harness-…

a350c74

…model fix(opencode): respect HARNESS_MODEL in baked-in opencode.json (small_model path)

santoshkumarradha force-pushed the main branch from 8824ead to e064a37 Compare May 11, 2026 15:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reviewer): surface author rationale to reviewer agents#25

fix(reviewer): surface author rationale to reviewer agents#25
AbirAbbas wants to merge 86 commits into
mainfrom
fix/reviewer-sees-author-intent

AbirAbbas commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AbirAbbas commented May 7, 2026

Summary

Changes

Background

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants