DRAFT: Add nightly integration test workflow by xingyaoww · Pull Request #385 · OpenHands/agent-canvas

xingyaoww · 2026-05-12T18:23:55Z

A human has tested these changes.

Why

We need a nightly integration test to catch regressions in the live Agent Server ↔ frontend stack, similar to the integration-runner in software-agent-sdk.

Summary

Adds .github/workflows/nightly-integration.yml — a GitHub Actions workflow that runs the Playwright live E2E suite on a recurring schedule and on demand.

Triggers

Trigger	When
Nightly schedule	10:30pm UTC every day (main repo only)
`workflow_dispatch`	Manual trigger with optional `model`, `pr_number`, and `reason` inputs
`integration-test` label	When added to a same-repo PR (via `pull_request_target`)

What it does

Starts a real local Agent Server + UI stack via npm run test:e2e:live
Runs the Playwright live E2E suite against a real LLM (default: openhands/claude-haiku-4-5-20251001 via the OpenHands LLM proxy)
Uploads Playwright reports and test results as artifacts (30-day retention)
Posts results as PR comments when triggered from a PR
Logs results to stdout for scheduled runs (tracker issue placeholder — update the issue number when one is created)

Security

Fork PRs are excluded from pull_request_target via explicit same-repo check
LLM credentials are only injected into the test step, not at job level
persist-credentials: false prevents checkout from leaking tokens

TODO (follow-ups)

Create a tracking issue for nightly results and wire up the gh issue comment call
Add multi-model matrix support (like software-agent-sdk's resolve_model_config.py)
Add consolidated report generation across models
Consider adding more live E2E test scenarios beyond the current terminal conversation smoke test

Type

Notes

This PR was created by an AI agent (OpenHands) on behalf of the user.

@xingyaoww can click here to continue refining the PR

Adds a scheduled GitHub Actions workflow that runs the live Agent Server E2E tests nightly, similar to the integration-runner in software-agent-sdk. Triggers: - Nightly schedule (10:30pm UTC) - Manual workflow_dispatch with optional model/PR number overrides - PR label 'integration-test' (same-repo PRs only, via pull_request_target) The workflow: - Starts a real local Agent Server + UI stack via npm run test:e2e:live - Runs the Playwright live E2E suite against the configured LLM - Uploads Playwright reports and test results as artifacts - Posts results as PR comments when triggered from a PR - Posts results to a tracking issue for scheduled runs (placeholder) Default model: openhands/claude-haiku-4-5-20251001 via the OpenHands LLM proxy. Co-authored-by: openhands <openhands@all-hands.dev>

vercel · 2026-05-12T18:24:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agent-canvas	Error		May 12, 2026 6:24pm

github-actions · 2026-05-12T18:24:07Z

PR Artifacts Notice

This PR contains a .pr/ directory with PR-specific artifacts. This directory will be automatically removed when the PR is approved.

Fork PRs require manual cleanup before merging.

vercel Bot had a problem deploying to Preview May 12, 2026 18:24 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT: Add nightly integration test workflow#385

DRAFT: Add nightly integration test workflow#385
xingyaoww wants to merge 1 commit into
mainfrom
nightly-integration-tests

xingyaoww commented May 12, 2026

Uh oh!

vercel Bot commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xingyaoww commented May 12, 2026

Why

Summary

Triggers

What it does

Security

TODO (follow-ups)

Type

Notes

Uh oh!

vercel Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 12, 2026 •

edited

Loading