[codex] fix server LLM model auto-selection by xuruiray · Pull Request #577 · THU-MAIC/OpenMAIC

xuruiray · 2026-05-17T06:29:53Z

Summary

Fix server-configured LLM startup so a provider selected by server sync also gets a usable default model when the current modelId is empty.

Related Issues

Fixes #498

Changes

Auto-fill the first available model for a server-configured LLM provider when server sync falls back from an unusable default provider, matching the existing image/video recovery behavior.
Add a regression test that mirrors the reported DeepSeek server-only setup: default openai provider, empty modelId, server response containing deepseek with deepseek-chat.
Add a narrow TypeScript compatibility fix for the OpenAI-compatible fetch wrapper so next build passes with the current Next/TypeScript fetch type.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Refactoring (no functional changes)
CI/CD or build changes

Verification

Steps to reproduce / test

Start from a store state with providerId: openai and an empty modelId.
Mock /api/server-providers to return only deepseek with models: ['deepseek-chat'].
Run fetchServerProviders() and verify the store selects providerId: deepseek and modelId: deepseek-chat.
Run the full local validation commands listed below.

What you personally verified

RED: pnpm vitest run tests/store/settings-server-sync.test.ts -t "selects the server LLM model" failed before the fix with expected '' to be 'deepseek-chat'.
GREEN: the same targeted test passed after the store sync change.
pnpm vitest run tests/store/settings-server-sync.test.ts passed: 47 tests.
pnpm test passed: 45 test files, 334 tests.
pnpm lint && pnpm check && pnpm build passed. Lint reports 7 existing warnings and 0 errors.
pnpm exec playwright install chromium was run because the local Playwright browser cache was missing.
CI=1 pnpm test:e2e passed: 11 Playwright tests. I used CI mode because a pre-existing next dev process held .next/dev/lock on port 3000; CI mode starts the production server on 3002 without touching that process.

Evidence

CI-style local checks passed (pnpm lint && pnpm check && pnpm build)
Unit/integration tests passed (pnpm test)
E2E tests passed (CI=1 pnpm test:e2e)
Manually verified the regression through a dedicated store sync test

Checklist

My code follows the project's coding style
I have performed a self-review of my code
I have added/updated documentation as needed
My changes do not introduce new warnings

Summary: - Auto-fill the first server-configured LLM model when provider sync falls back from an unusable default provider. - Add a regression test for the DeepSeek server-only configuration reported in THU-MAIC#498. - Keep the OpenAI-compatible fetch wrapper buildable with Next's fetch type. Rationale: - Server-configured providers were marked usable, but an empty modelId still blocked classroom entry. Tests: - pnpm vitest run tests/store/settings-server-sync.test.ts -t "selects the server LLM model" - pnpm vitest run tests/store/settings-server-sync.test.ts - pnpm test - pnpm lint && pnpm check && pnpm build - CI=1 pnpm test:e2e Co-authored-by: Codex <codex@openai.com>

wyuc

Approving. Verified locally: reverting lib/store/settings.ts to parent makes exactly the new regression test fail; an e2e repro (default openai + empty modelId + server-only deepseek) is blocked by the model toast without this change and enters generation with it.

Nit (non-blocking): LLM fallback is gated by isServerConfigured while image/video fall back to models[0] unconditionally — asymmetry from #266; worth a later follow-up.

…580) (#581) * fix(settings): enforce "usable provider ⇒ concrete model" invariant (#580) Layer 1 — resolve provider AND model atomically at every source: - add resolveSelectedModel(): like validateModel but no empty-input short-circuit, so it returns '' only when the provider has no models - fetchServerProviders: symmetric LLM/image/video provider recovery + resolveSelectedModel; remove the asymmetric #577 isServerConfigured-gated LLM tail and the separate first-load auto-select block (now subsumed, and covers client-API-key providers too) - setProviderConfig: entering an API key for the active/unset LLM provider resolves a concrete model in the same set() - setVideoProvider: resolve a model on switch (was id-only); setImageProvider routed through the shared resolver Layer 2 — remove states unreachable under the invariant: - add shared isLLMProviderConfigured / hasUsableLLMProvider; gate the landing-page generate button on a usable provider (state A vs B) and drop the modelNotConfigured toast + forced SettingsDialog - generation-toolbar: use the shared predicate; drop the dead "Select Model" label/tooltip fallback - delete dead components/settings/model-selector.tsx (zero refs) and the now-orphan settings.selectModel i18n key (6 locales) Note: generation-toolbar's `providerEntries.length === 0` branch is KEPT — it is reachable via search (search-no-results), not the dead no-provider state the issue assumed. modelNotConfigured i18n key kept (still used by the chat path in use-chat-sessions). Tests: resolveSelectedModel + hasUsableLLMProvider unit tests; server-sync invariant tests (client-key resolution, atomic API-key entry, provider switch, serverModels-preference + first-load regression guards). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(settings): address code review — symmetric image/video config resolution Code review Important #1: fold atomic model resolution into setImageProviderConfig / setVideoProviderConfig (design doc Layer 1 step 2 scoped these). Deleting the selected custom image/video model on the active provider now resolves to a valid model in the same set() instead of leaving it stale until the next server sync — restores LLM/image/video symmetry. Minor #2/#3: document the deliberate store-level (credential path) vs UX-level (incl. ≥1 model) "usable" split on isProviderUsable; clarify that setProviderConfig also adopts providerId on the first-load API-key path. Tests: +2 invariant tests (delete selected custom image/video model → falls back to a valid model). 89/89 store tests pass, tsc clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(settings): keyless providers need explicit baseUrl to count as usable (#580) Code-review/e2e follow-up. ollama/lemonade ship requiresApiKey:false + a registry defaultBaseUrl, so the old isLLMProviderConfigured counted them as configured on a fresh install — but buildFallback/reconcile never auto-selects keyless providers, so hasUsableProvider could be true while modelId stayed '', re-enabling generation with no model (the exact #498/#580 class of bug, just via the keyless path). Narrow isLLMProviderConfigured to be consistent with isProviderUsable: keyless ⇒ usable only once the user sets an explicit baseUrl (registry defaultBaseUrl alone is not user intent); server-config and key-requiring paths unchanged. Now gate + toolbar + reconcile agree, and genuine State A (no usable provider) is reachable again. Add e2e/tests/model-invariant-580.spec.ts: State A (no provider → generate disabled + single "Set up model" affordance, no toast/forced dialog) and State B (server-configured → concrete model auto-resolved, generation enabled). +3 keyless unit tests. 92/92 store tests + 2 e2e pass, tsc clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(settings): re-resolve selection on bulk provider-config change (#580) Self-test found: deleting the selected provider in Settings left the model selector pointing at the deleted/invalid provider (or, via the component's hand-rolled fallback, at an unusable built-in like openai with no key). Root cause: setProvidersConfig (the bulk setter behind delete-provider / delete-model) did ZERO re-resolution — Layer 1 covered setProviderConfig (singular) + image/video setters + the reconcile, but not this path. The component's confirmDeleteProvider then picked Object.keys()[0] (always a built-in, ignoring usability), the exact per-call-site hand-rolled shape #580 condemns. Fix at the source: setProvidersConfig now re-resolves (providerId, modelId) via the canonical isLLMProviderConfigured + resolveSelectedModel — keep the current provider if still usable, else first usable, else State A. Removed the brittle confirmDeleteProvider override (kept only its local Settings-tab selection). Now every bulk config mutation (delete, import, reset) holds the invariant. Tests: +2 store tests (delete selected/only-usable provider → State A, not the deleted/invalid one; delete non-selected → keep usable selection). 94/94 store tests, full suite green (only pre-existing ssrf DNS), tsc/eslint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(settings): one shared LLM-selection resolver for all config edits (#580) Clarified repro: the realistic case isn't deleting a custom provider — it's clearing the SELECTED provider's API key so it becomes invalid. setProvider Config only re-resolved when the edit made a provider *usable*; the "became unusable" transition fell through (return base) → the now-invalid provider stayed selected. The previous commit fixed only the bulk path. Root-cause fix: extract one shared `resolveLLMSelection(config, providerId, modelId)` — keep current provider if still usable, else first usable, else State A; then resolve the model. Use it in BOTH setProviderConfig (single edit: key add/clear, model-list edit) and setProvidersConfig (bulk: delete, import, reset). The two paths can no longer diverge — exactly the per-call- site asymmetry #580 set out to eliminate. Drops the now-unused isProviderUsable import. Tests: +2 store tests (clear selected provider key → drops stale selection; clear non-selected key → keeps usable selection). 96/96 store tests, full suite green except pre-existing ssrf DNS; tsc + eslint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * style: prettier-format #580 test specs CI 'Lint, Typecheck & Unit Tests' ran prettier . --check and failed on the two added test specs (only tsc/eslint/vitest were run locally). Pure whitespace reformat, no logic change — 96/96 store tests still green. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

xuruiray marked this pull request as ready for review May 17, 2026 06:38

xuruiray mentioned this pull request May 17, 2026

[Bug]: 配置不成功 #498

Closed

wyuc approved these changes May 17, 2026

View reviewed changes

Merge branch 'main' into codex/fix-issue-498-llm-model-autoselect

64488ce

wyuc merged commit 2ed39da into THU-MAIC:main May 17, 2026
2 checks passed

This was referenced May 17, 2026

Follow-up (#577/#498): make "usable provider ⇒ a concrete model is always selected" an invariant, and align it across code + UX #580

Closed

fix(settings): enforce "usable provider ⇒ concrete model" invariant (#580) #581

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] fix server LLM model auto-selection#577

[codex] fix server LLM model auto-selection#577
wyuc merged 2 commits into
THU-MAIC:mainfrom
xuruiray:codex/fix-issue-498-llm-model-autoselect

xuruiray commented May 17, 2026

Uh oh!

wyuc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xuruiray commented May 17, 2026

Summary

Related Issues

Changes

Type of Change

Verification

Steps to reproduce / test

What you personally verified

Evidence

Checklist

Uh oh!

wyuc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants