feat(azure-compute): add VM creation workflow with approval gate before deploy by deankroker · Pull Request #2297 · microsoft/GitHub-Copilot-for-Azure

deankroker · 2026-05-18T22:15:00Z

What this does

Before this PR, asking Copilot for Azure to "create a VM" jumped straight to generating Bicep or running az vm create. The user had no chance to review what was about to be deployed, and for VM scale sets the model often missed the intent entirely.

This PR adds a guided workflow so the same request now:

Asks 2–3 short questions to figure out what the user actually needs (region, size, OS, network exposure)
Shows a Plan Card — a 14–21 row summary of every parameter the deployment will use, with each value labeled by where it came from (user input, default, recommendation)
Waits for the user to approve, edit, or change format — in a single picker, not three sequential prompts
Only then generates Bicep / Terraform / az CLI / MCP calls

It also recognizes VMSS intent without the user having to say "VMSS" (e.g. "fleet that autoscales" → Flexible orchestration scale set), and refuses to deploy templates that would leave SSH open to the internet.

What's in the PR

Workflow + supporting docs

57789bd — new vm-creator workflow under plugin/skills/azure-compute/, with depth-probe references (beginner / cost / networking / security / spec) so the question set adapts to user expertise. Output adapters for az CLI, Bicep, Terraform, and Azure MCP. Trims vm-recommender.md and pulls verbose web_fetch rules into a reference file.
a7c7954 — Plan Card approval is now one batched picker (approve / edit / change format) instead of three sequential questions.

Plumbing

e729b2f — switches the Azure MCP CLI to --mode single, collapsing ~26 per-service tool schemas (~30K tokens) behind one routing tool (~3K tokens). Same APIs, same auth — the schemas just don't load until the tool is called. This is what makes the workflow fit in context without truncating.

Eval coverage

4592b1f — evals/azure-compute/eval.yaml with 16 stimuli covering router accuracy, depth-probe branches, VMSS recognition, Plan Card gate, and output adapter coverage. Smoke tier (5 × 3) was 3/5 flaky before; this PR brings it to 5/5 passing 3-for-3.

Follow-ups during review

710e3f7 — fixes Skill Structure CI: a few cross-file references pointed at paths that didn't exist after the workflow split.
a590be1 — rewrites the skill frontmatter description: from a >- folded scalar to an inline double-quoted string. skills.sh only accepts the inline form.
ef08203 — removes the '*' default on sshSourceAddressPrefix (Bicep) and ssh_source_address_prefix (Terraform). The templates now require the caller to pass their own IP or a trusted CIDR; opening port 22 to the entire internet has to be an explicit, accepted choice. Resolves the open Copilot security suggestions.

Why it matters

The visible change: a user asking for a VM gets to see and edit what's about to be deployed, instead of finding out after the fact. The eval suite locks the behavior in so it doesn't regress.

The invisible change (e729b2f): the whole workflow used to run out of context room around turn 3 because the Azure MCP server was loading every service's tool schema up front. With --mode single, schemas load on demand and the workflow holds end-to-end.

Test plan

vally run evals/azure-compute/eval.yaml --tier smoke — 5/5 stimuli × 3 trials
Manual: "I need a small Ubuntu VM to test something" → beginner depth probe → 2 questions → Plan Card → batched picker. PASS (5 turns, $0.42)
Manual: "set up an autoscaling web tier for ~500 rps" → VMSS recognized without user saying VMSS → Flexible orchestration recommended. PASS (4 turns, $0.39)
Manual: "apply via MCP" end-to-end — verify the single MCP routing tool stays responsive past turn 3. PASS through 16 turns (quota validation caught zero D2s_v5 quota, model auto-switched to B1s). Apply step was cancelled at the permission prompt, which is orthogonal to the schema-cost fix being tested.
No regressions in vm-recommender standalone (recommend-only, no hand-off to creator). PASS (1 turn, $0.30)
All 11 CI checks pass on the latest commit (ef08203)

Manual items run via the claude-copilot proxy on claude-sonnet-4.6 with the plugin loaded from a working tree clone.

…der token limits Adds the vm-creator workflow (Plan-Card-gated VM/VMSS provisioning with Azure MCP, az CLI, Bicep, and Terraform output adapters) and refactors the existing azure-compute skill files to stay well under CI token limits: - SKILL.md 1,840 -> 382 tokens - vm-recommender.md 3,006 -> 1,573 tokens (extracted handoff-to-creator and web-fetch-policy refs) - vm-creator.md new; 1,676 tokens vm-creator splits its larger references into sub-folders so every file sits under the 1,000-token soft limit: - references/depth-probe/ (6 files, max 755t) - references/output-adapters/ (5 files, max 760t) - references/delivery-options/ (4 files, max 817t) - references/{plan-card,validation-gates,mcp-tools}.md (3 standalone) Bicep and Terraform templates moved to examples/ as code files so they don't count against markdown token budgets. The skill now consumes the four new compute discovery tools shipping in the paired mcp PR (vm list-skus, vm list-images, vm check-quota, vm recommend-region) as its Step 4 validation gates. Zero hard-limit violations and zero soft warnings on any file in this PR.

The bundled Azure MCP server registers one tool per service namespace by default (~26 tools: compute, storage, keyvault, monitor, sql, ...). Combined with the azure-compute skill's loaded reference files, the total context routinely crosses Claude Code's autocompact threshold within 3 turns, causing thrash errors during vm-creator flows. Passing --mode single collapses all namespaces into one routing tool (per the upstream @azure/mcp CLI). Schema cost drops from ~30K to ~3K tokens with zero functionality loss — service routing happens inside the single tool via its command arg. Verified: echo "create a vm" | claude --plugin-dir output --print - before: autocompact thrash after 3 turns - after: 24.9s, 145K input tokens, full Plan Card rendered

… action picker The old approve → output-format → delivery flow was 3 sequential popups for what's effectively one decision ("how do you want this delivered?"). This trims plan-card.md to specify a single AskUserQuestion with 4 likely combinations, an explicit "Edit a row" branch, and an early-skip when the user's original prompt already implied the delivery choice. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

…SS, plan card, adapters 16 stimuli across the azure-compute skill surface area: - Router accuracy (vm-creator vs vm-recommender vs azure-prepare handoff) - Depth-probe branch selection (beginner / spec / cost / networking / security) - VMSS recognition without the user saying "VMSS" (agent pools, runner fleets) - Plan Card gate (no artifact emitted before approval, even under pressure) - Output adapter coverage (az CLI / Bicep / Terraform / MCP apply) - Security posture (no open SSH/RDP, no plaintext passwords, managed identity preferred) Smoke tier (5 stimuli, --tag tier=smoke) currently passes 15/15 trials with zero flakes against claude-sonnet-4.6. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot

Pull request overview

This PR expands the azure-compute skill from recommendation/troubleshooting routing into a fuller VM/VMSS creation workflow, adds MCP single-tool mode context optimization, and introduces an eval suite to guard routing and Plan Card behavior.

Changes:

Adds a new vm-creator workflow with depth probes, validation gates, Plan Card approval, output adapters, delivery options, and example Bicep/Terraform artifacts.
Updates the existing VM recommender and main skill router to hand off provisioning requests to vm-creator.
Adds Azure Compute Vally eval coverage and switches Azure MCP startup to --mode single.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`plugin/skills/azure-compute/SKILL.md`	Updates router description and adds `vm-creator` routing.
`plugin/.mcp.json`	Starts Azure MCP in single-tool mode.
`evals/azure-compute/eval.yaml`	Adds azure-compute routing/workflow eval suite.
`plugin/skills/azure-compute/workflows/vm-recommender/vm-recommender.md`	Trims recommender workflow and adds creator handoff.
`plugin/skills/azure-compute/workflows/vm-recommender/references/web-fetch-policy.md`	Adds live-doc verification fallback policy.
`plugin/skills/azure-compute/workflows/vm-recommender/references/handoff-to-creator.md`	Defines recommender-to-creator handoff rules.
`plugin/skills/azure-compute/workflows/vm-creator/vm-creator.md`	Adds top-level VM/VMSS creation workflow.
`plugin/skills/azure-compute/workflows/vm-creator/references/validation-gates.md`	Documents required pre-Plan Card validation checks.
`plugin/skills/azure-compute/workflows/vm-creator/references/plan-card.md`	Defines Plan Card schema and approval picker.
`plugin/skills/azure-compute/workflows/vm-creator/references/mcp-tools.md`	Documents MCP commands and CLI fallbacks.
`plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/index.md`	Adds depth-probe branch selection.
`plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/beginner.md`	Adds beginner fast-path defaults.
`plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/cost-deep.md`	Adds cost-focused questioning guidance.
`plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/networking-deep.md`	Adds networking-focused questioning guidance.
`plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/security-deep.md`	Adds security-focused questioning guidance.
`plugin/skills/azure-compute/workflows/vm-creator/references/depth-probe/spec-deep.md`	Adds SKU/spec-focused questioning guidance.
`plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/index.md`	Adds shared output adapter routing and field mapping.
`plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/az-cli.md`	Adds az CLI output adapter.
`plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/bicep.md`	Adds Bicep output adapter.
`plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/terraform.md`	Adds Terraform output adapter.
`plugin/skills/azure-compute/workflows/vm-creator/references/output-adapters/mcp-apply.md`	Adds live Azure MCP apply adapter.
`plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/index.md`	Adds delivery mode decision logic.
`plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/print.md`	Adds print-in-chat delivery mode.
`plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/save-local.md`	Adds local file save delivery mode.
`plugin/skills/azure-compute/workflows/vm-creator/references/delivery-options/github-pr.md`	Adds GitHub PR delivery mode.
`plugin/skills/azure-compute/workflows/vm-creator/examples/bicep/main.bicep`	Adds deployable Bicep VM example.
`plugin/skills/azure-compute/workflows/vm-creator/examples/bicep/README.md`	Adds Bicep example usage documentation.
`plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/main.tf`	Adds deployable Terraform VM example.
`plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/variables.tf`	Adds Terraform input variables.
`plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/outputs.tf`	Adds Terraform outputs.
`plugin/skills/azure-compute/workflows/vm-creator/examples/terraform/README.md`	Adds Terraform example usage documentation.

- Fix mcp-tool-references test by rephrasing line 38 of mcp-tools.md to avoid the literal `azure__compute_vm_check-quota` regex match (the prose was being parsed as a tool reference even though it was just illustrating sub-command vs tool distinction) - Update triggers.test.ts to reflect the broader azure-compute scope: remove prompts from the no-trigger list that now legitimately trigger due to generic compute keywords ("create"/"deploy"/"server") in the expanded description; add comment explaining the router's disambiguation rule forwards non-VM intents to the right skill - Regenerate triggers snapshot for the expanded keyword set - Address 7 Copilot review comments across bicep/terraform examples, depth-probe, plan-card, vm-creator, vm-recommender, and adapter docs

…h format rule The frontmatter validator (description-format rule) rejects >- folded scalars because skills.sh in the runtime is incompatible with them. Convert the azure-compute SKILL.md description from a >- folded multi-line scalar to an inline double-quoted string. Parsed text is identical, so trigger snapshots and keyword extraction are unchanged. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

…form examples Both vm-creator templates previously defaulted the SSH allow-rule source to '*', leaving port 22 open to the entire internet on any deployment that didn't override the parameter. An exposed SSH endpoint gets credential- stuffed within minutes of going public. Make the parameter required (no default) and document the security tradeoff on the description so callers must pass either their own /32 or a trusted CIDR before the template will deploy. Quickstart now derives MY_IP from ifconfig.me and passes it explicitly; cleanup command updated to match. Addresses Copilot review comments on PR microsoft#2297.

deankroker · 2026-05-18T23:52:27Z

Pushed ef08203 to address the open Copilot review suggestions. Audited all 6; 3 had real underlying problems that needed code changes, 3 were already correct as written.

Fixed (ef08203):

File	Change
`examples/bicep/main.bicep`	`sshSourceAddressPrefix` no longer defaults to `'*'` — now required; description explains the tradeoff
`examples/terraform/variables.tf`	`ssh_source_address_prefix` no longer defaults to `"*"` — now required; description explains the tradeoff
`examples/terraform/README.md`	Quickstart derives `MY_IP=$(curl -s ifconfig.me)/32` and passes it explicitly to plan/apply; variables table marks the var required; Notes call out why; Cleanup command updated to match

Templates now refuse to deploy without an explicit source prefix — the open-to-internet path requires the user to pass "*" and accept the risk.

Audited, no change needed:

Comment	Why no change
`plan-card.md` "4 options" vs 6 listed	File already says "6 options" on the relevant line; suggestion was based on stale context
`mcp-apply.md` `familyPrefix` → `family`	The doc uses `familyPrefix` for `compute_vm_list-skus` (correct param for that command) and `family` for `compute_vm_check-quota` (correct param for that command) — confirmed against `references/mcp-tools.md`
`depth-probe/beginner.md` SSH/RDP defaults to `*`	Beginner path already defaults to the user's current public IP via `curl -s ifconfig.me`; `*` is only the fallback if detection fails, and is always flagged with ⚠

Watching CI; will flag if anything regresses.

Azure MCP uses AzureCliCredential (not DefaultAzureCredential) to pick up the `az login` session. Addresses review comment from JasonYeMSFT on PR microsoft#2297.

Plugin-wide change should not ride on a single skill's PR. Splitting the context-savings work to a follow-up PR scoped to plugin config. Addresses review comment from JasonYeMSFT on PR microsoft#2297.

Per review feedback, omit the brittle `AzureCliCredential` reference from the MCP apply prerequisite. State the user-facing requirement (be signed in to Azure, e.g. `az login`) and treat the exact credential mechanism as an MCP server implementation detail. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Per JasonYeMSFT review on PR microsoft#2297 (thread 3270086905, eval.yaml:757): when the user combines an explicit deliverable ("give me the Bicep") with an explicit refusal of dialog ("no questions", "skip planning"), the vm-creator workflow should honor that intent rather than overrule it with a mandatory Plan Card approval popup. vm-creator Step 5 now has two paths: - Default: render the full Plan Card table + Approve/Edit AskUserQuestion - Explicit-override fast path: emit a one-line preview of high-signal decisions + the requested artifact, no popup. Step 4 validation gates (SKU / image / quota / region) still run. The plan-card-gate-001 eval stimulus is renamed to plan-card-override-001. Prompt is unchanged (same explicit override), graders flipped to verify the new contract: Bicep must emit directly, must name the SKU, must NOT render the full Plan Card markdown table, must NOT block on the Approve/Edit AskUserQuestion. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…muli Per JasonYeMSFT review on PR microsoft#2297 (thread 3270067015, eval.yaml:91): output-text regex was a fragile proxy for routing validation. Migrate the four router smoke stimuli to the skill-invocation grader, which checks the actual trajectory's skill_activation events rather than scraping response text. Changes: - route-vm-creator-001: required=[azure-compute], disallowed=[azure-prepare] - route-vmss-001: required=[azure-compute] - route-vm-recommender-001: required=[azure-compute] - route-away-azure-prepare-001: disallowed=[azure-compute] (the anti-pattern is azure-compute silently provisioning a VM; both azure-prepare and azure-deploy are legitimate handoff targets) Content-checking graders (Ubuntu mention, VMSS substring, SKU patterns, pricing references, no-VM-artifact anti-pattern) are kept on output-text regex — those are behavioral assertions about response content, distinct from routing. Adds skill-invocation: 1.5 to scoring weights (mirrors azure-enterprise-infra-planner) so routing-miss failures are weighted appropriately. Verified end-to-end: smoke tier (5 stimuli, 1 run each) passes at 100% against claude-sonnet-4.6. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…from route-vmss-001 Per review feedback, output-not-matches graders are prone to false-positive failures because agent output is stochastic. Drop the open-management-port not-match grader from the route-vmss-001 case Jason flagged. The same risky-NSG behavior is covered elsewhere in the workflow via skill content and Plan Card rendering, so the eval signal is preserved without the brittle negative pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

deankroker and others added 4 commits May 17, 2026 09:39

Copilot AI review requested due to automatic review settings May 18, 2026 22:15

deankroker requested review from alex-thompson, jongio, joybb and rakal-dyh as code owners May 18, 2026 22:15

Copilot started reviewing on behalf of deankroker May 18, 2026 22:20 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

deankroker and others added 3 commits May 18, 2026 16:12

deankroker changed the title ~~feat(azure-compute): vm-creator workflow + MCP context fix + eval suite~~ feat(azure-compute): add VM creation workflow with approval gate before deploy May 19, 2026

JasonYeMSFT requested changes May 19, 2026

View reviewed changes

deankroker and others added 6 commits May 20, 2026 12:25

docs(azure-compute): correct Azure MCP auth credential class

625c8b3

Azure MCP uses AzureCliCredential (not DefaultAzureCredential) to pick up the `az login` session. Addresses review comment from JasonYeMSFT on PR microsoft#2297.

revert(azure-compute): drop Azure MCP --mode single from plugin config

f08a3b7

Plugin-wide change should not ride on a single skill's PR. Splitting the context-savings work to a follow-up PR scoped to plugin config. Addresses review comment from JasonYeMSFT on PR microsoft#2297.

github-actions Bot mentioned this pull request May 21, 2026

[repo-status] Weekly Repo Status — May 15–21, 2026 #2358

Open

deankroker requested a review from JasonYeMSFT May 21, 2026 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(azure-compute): add VM creation workflow with approval gate before deploy#2297

feat(azure-compute): add VM creation workflow with approval gate before deploy#2297
deankroker wants to merge 13 commits into
microsoft:mainfrom
deankroker:feat/azure-compute-skill

deankroker commented May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deankroker commented May 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

deankroker commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

What's in the PR

Why it matters

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deankroker commented May 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deankroker commented May 18, 2026 •

edited

Loading