Skip to content

[codex] fix server LLM model auto-selection#577

Merged
wyuc merged 2 commits into
THU-MAIC:mainfrom
xuruiray:codex/fix-issue-498-llm-model-autoselect
May 17, 2026
Merged

[codex] fix server LLM model auto-selection#577
wyuc merged 2 commits into
THU-MAIC:mainfrom
xuruiray:codex/fix-issue-498-llm-model-autoselect

Conversation

@xuruiray
Copy link
Copy Markdown
Contributor

Summary

Fix server-configured LLM startup so a provider selected by server sync also gets a usable default model when the current modelId is empty.

Related Issues

Fixes #498

Changes

  • Auto-fill the first available model for a server-configured LLM provider when server sync falls back from an unusable default provider, matching the existing image/video recovery behavior.
  • Add a regression test that mirrors the reported DeepSeek server-only setup: default openai provider, empty modelId, server response containing deepseek with deepseek-chat.
  • Add a narrow TypeScript compatibility fix for the OpenAI-compatible fetch wrapper so next build passes with the current Next/TypeScript fetch type.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactoring (no functional changes)
  • CI/CD or build changes

Verification

Steps to reproduce / test

  1. Start from a store state with providerId: openai and an empty modelId.
  2. Mock /api/server-providers to return only deepseek with models: ['deepseek-chat'].
  3. Run fetchServerProviders() and verify the store selects providerId: deepseek and modelId: deepseek-chat.
  4. Run the full local validation commands listed below.

What you personally verified

  • RED: pnpm vitest run tests/store/settings-server-sync.test.ts -t "selects the server LLM model" failed before the fix with expected '' to be 'deepseek-chat'.
  • GREEN: the same targeted test passed after the store sync change.
  • pnpm vitest run tests/store/settings-server-sync.test.ts passed: 47 tests.
  • pnpm test passed: 45 test files, 334 tests.
  • pnpm lint && pnpm check && pnpm build passed. Lint reports 7 existing warnings and 0 errors.
  • pnpm exec playwright install chromium was run because the local Playwright browser cache was missing.
  • CI=1 pnpm test:e2e passed: 11 Playwright tests. I used CI mode because a pre-existing next dev process held .next/dev/lock on port 3000; CI mode starts the production server on 3002 without touching that process.

Evidence

  • CI-style local checks passed (pnpm lint && pnpm check && pnpm build)
  • Unit/integration tests passed (pnpm test)
  • E2E tests passed (CI=1 pnpm test:e2e)
  • Manually verified the regression through a dedicated store sync test

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have added/updated documentation as needed
  • My changes do not introduce new warnings

Summary:
- Auto-fill the first server-configured LLM model when provider sync falls back from an unusable default provider.
- Add a regression test for the DeepSeek server-only configuration reported in THU-MAIC#498.
- Keep the OpenAI-compatible fetch wrapper buildable with Next's fetch type.

Rationale:
- Server-configured providers were marked usable, but an empty modelId still blocked classroom entry.

Tests:
- pnpm vitest run tests/store/settings-server-sync.test.ts -t "selects the server LLM model"
- pnpm vitest run tests/store/settings-server-sync.test.ts
- pnpm test
- pnpm lint && pnpm check && pnpm build
- CI=1 pnpm test:e2e

Co-authored-by: Codex <codex@openai.com>
@xuruiray xuruiray marked this pull request as ready for review May 17, 2026 06:38
@xuruiray xuruiray mentioned this pull request May 17, 2026
Copy link
Copy Markdown
Contributor

@wyuc wyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Verified locally: reverting lib/store/settings.ts to parent makes exactly the new regression test fail; an e2e repro (default openai + empty modelId + server-only deepseek) is blocked by the model toast without this change and enters generation with it.

Nit (non-blocking): LLM fallback is gated by isServerConfigured while image/video fall back to models[0] unconditionally — asymmetry from #266; worth a later follow-up.

@wyuc wyuc merged commit 2ed39da into THU-MAIC:main May 17, 2026
2 checks passed
cosarah pushed a commit that referenced this pull request May 18, 2026
…580) (#581)

* fix(settings): enforce "usable provider ⇒ concrete model" invariant (#580)

Layer 1 — resolve provider AND model atomically at every source:
- add resolveSelectedModel(): like validateModel but no empty-input
  short-circuit, so it returns '' only when the provider has no models
- fetchServerProviders: symmetric LLM/image/video provider recovery +
  resolveSelectedModel; remove the asymmetric #577 isServerConfigured-gated
  LLM tail and the separate first-load auto-select block (now subsumed,
  and covers client-API-key providers too)
- setProviderConfig: entering an API key for the active/unset LLM provider
  resolves a concrete model in the same set()
- setVideoProvider: resolve a model on switch (was id-only); setImageProvider
  routed through the shared resolver

Layer 2 — remove states unreachable under the invariant:
- add shared isLLMProviderConfigured / hasUsableLLMProvider; gate the
  landing-page generate button on a usable provider (state A vs B) and drop
  the modelNotConfigured toast + forced SettingsDialog
- generation-toolbar: use the shared predicate; drop the dead "Select Model"
  label/tooltip fallback
- delete dead components/settings/model-selector.tsx (zero refs) and the
  now-orphan settings.selectModel i18n key (6 locales)

Note: generation-toolbar's `providerEntries.length === 0` branch is KEPT —
it is reachable via search (search-no-results), not the dead no-provider
state the issue assumed. modelNotConfigured i18n key kept (still used by
the chat path in use-chat-sessions).

Tests: resolveSelectedModel + hasUsableLLMProvider unit tests; server-sync
invariant tests (client-key resolution, atomic API-key entry, provider
switch, serverModels-preference + first-load regression guards).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(settings): address code review — symmetric image/video config resolution

Code review Important #1: fold atomic model resolution into
setImageProviderConfig / setVideoProviderConfig (design doc Layer 1 step 2
scoped these). Deleting the selected custom image/video model on the active
provider now resolves to a valid model in the same set() instead of leaving
it stale until the next server sync — restores LLM/image/video symmetry.

Minor #2/#3: document the deliberate store-level (credential path) vs
UX-level (incl. ≥1 model) "usable" split on isProviderUsable; clarify that
setProviderConfig also adopts providerId on the first-load API-key path.

Tests: +2 invariant tests (delete selected custom image/video model →
falls back to a valid model). 89/89 store tests pass, tsc clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(settings): keyless providers need explicit baseUrl to count as usable (#580)

Code-review/e2e follow-up. ollama/lemonade ship requiresApiKey:false +
a registry defaultBaseUrl, so the old isLLMProviderConfigured counted them
as configured on a fresh install — but buildFallback/reconcile never
auto-selects keyless providers, so hasUsableProvider could be true while
modelId stayed '', re-enabling generation with no model (the exact #498/#580
class of bug, just via the keyless path).

Narrow isLLMProviderConfigured to be consistent with isProviderUsable:
keyless ⇒ usable only once the user sets an explicit baseUrl (registry
defaultBaseUrl alone is not user intent); server-config and key-requiring
paths unchanged. Now gate + toolbar + reconcile agree, and genuine State A
(no usable provider) is reachable again.

Add e2e/tests/model-invariant-580.spec.ts: State A (no provider → generate
disabled + single "Set up model" affordance, no toast/forced dialog) and
State B (server-configured → concrete model auto-resolved, generation
enabled). +3 keyless unit tests. 92/92 store tests + 2 e2e pass, tsc clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(settings): re-resolve selection on bulk provider-config change (#580)

Self-test found: deleting the selected provider in Settings left the model
selector pointing at the deleted/invalid provider (or, via the component's
hand-rolled fallback, at an unusable built-in like openai with no key).

Root cause: setProvidersConfig (the bulk setter behind delete-provider /
delete-model) did ZERO re-resolution — Layer 1 covered setProviderConfig
(singular) + image/video setters + the reconcile, but not this path. The
component's confirmDeleteProvider then picked Object.keys()[0] (always a
built-in, ignoring usability), the exact per-call-site hand-rolled shape
#580 condemns.

Fix at the source: setProvidersConfig now re-resolves (providerId, modelId)
via the canonical isLLMProviderConfigured + resolveSelectedModel — keep the
current provider if still usable, else first usable, else State A. Removed
the brittle confirmDeleteProvider override (kept only its local Settings-tab
selection). Now every bulk config mutation (delete, import, reset) holds the
invariant.

Tests: +2 store tests (delete selected/only-usable provider → State A, not
the deleted/invalid one; delete non-selected → keep usable selection).
94/94 store tests, full suite green (only pre-existing ssrf DNS), tsc/eslint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(settings): one shared LLM-selection resolver for all config edits (#580)

Clarified repro: the realistic case isn't deleting a custom provider — it's
clearing the SELECTED provider's API key so it becomes invalid. setProvider
Config only re-resolved when the edit made a provider *usable*; the
"became unusable" transition fell through (return base) → the now-invalid
provider stayed selected. The previous commit fixed only the bulk path.

Root-cause fix: extract one shared `resolveLLMSelection(config, providerId,
modelId)` — keep current provider if still usable, else first usable, else
State A; then resolve the model. Use it in BOTH setProviderConfig (single
edit: key add/clear, model-list edit) and setProvidersConfig (bulk: delete,
import, reset). The two paths can no longer diverge — exactly the per-call-
site asymmetry #580 set out to eliminate. Drops the now-unused
isProviderUsable import.

Tests: +2 store tests (clear selected provider key → drops stale selection;
clear non-selected key → keeps usable selection). 96/96 store tests, full
suite green except pre-existing ssrf DNS; tsc + eslint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style: prettier-format #580 test specs

CI 'Lint, Typecheck & Unit Tests' ran prettier . --check and failed on the
two added test specs (only tsc/eslint/vitest were run locally). Pure
whitespace reformat, no logic change — 96/96 store tests still green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 配置不成功

2 participants