Skip to content

feat: report initialized backend#288

Merged
DenisovAV merged 5 commits into
DenisovAV:mainfrom
merlinnot:codex/report-active-backend
May 26, 2026
Merged

feat: report initialized backend#288
DenisovAV merged 5 commits into
DenisovAV:mainfrom
merlinnot:codex/report-active-backend

Conversation

@merlinnot
Copy link
Copy Markdown
Contributor

@merlinnot merlinnot commented May 21, 2026

Summary

  • expose InferenceModel.activeBackend so callers can inspect the backend actually initialized by the runtime
  • make FFI .litertlm initialization try backend fallback inside the plugin and return the backend that succeeded
  • share and test the FFI backend fallback order / LiteRT-LM wire names
  • report null for web and MediaPipe runtimes where the final initialized backend is not currently observable

Motivation

Callers can request a preferred backend, but the runtime may fall back. This gives applications a way to show users the compute backend actually in use without implementing their own fallback or probe loop.

Tests

  • test/core/ffi/backend_preference_test.dart covers NPU -> GPU -> CPU, GPU -> CPU, null -> GPU -> CPU, CPU-only fallback, and wire names
  • test/core/ffi/ffi_inference_model_test.dart covers that the FFI model exposes the initialized backend

Validation

  • flutter pub get
  • dart analyze lib/flutter_gemma_interface.dart lib/core/ffi/backend_preference.dart lib/core/ffi/ffi_inference_model.dart lib/core/ffi/ffi_inference_model_stub.dart lib/mobile/flutter_gemma_mobile_inference_model.dart lib/web/flutter_gemma_web.dart lib/mobile/flutter_gemma_mobile.dart lib/desktop/flutter_gemma_desktop.dart test/core/ffi/backend_preference_test.dart test/core/ffi/ffi_inference_model_test.dart
  • flutter test test/core/ffi/backend_preference_test.dart test/core/ffi/ffi_inference_model_test.dart test/core/litertlm_file_type_test.dart test/desktop_vision_params_test.dart --reporter=failures-only
  • git diff --check

Notes

flutter pub get updated local lockfile resolution for meta and test_api with this Flutter SDK, but those lockfile changes were intentionally not included in the commit.

@merlinnot merlinnot marked this pull request as ready for review May 21, 2026 20:37
@DenisovAV
Copy link
Copy Markdown
Owner

Thanks for this — the activeBackend getter + automatic fallback chain
is exactly the observability hook that's been missing. The pure-function
extraction in backend_preference.dart reads well, tests are well-scoped,
and the diff rebases cleanly on top of release/0.16.2.

Three things I'd like to address before merging, all in the fallback
loop in flutter_gemma_desktop.dart:387 / flutter_gemma_mobile.dart:692:

  1. Throw the last error, not the first — or aggregate. Today
    firstError ??= error means a user who requested NPU and effectively
    ran on CPU sees the NPU error, not whatever made CPU fail (if it
    did). When all backends fail, the last attempt is the configuration
    they actually got — it's usually the most actionable. Even better:
    a small BackendInitException(attempts: [...]) so the chain is
    inspectable.

  2. on Object catch is wider than intended. It traps OutOfMemoryError,
    StackOverflowError, AssertionError etc., which are programming bugs
    rather than backend-capability problems. Either on Exception catch,
    or on Object catch (e) { if (e is OutOfMemoryError || e is StackOverflowError) rethrow; ... }.

  3. Silent NPU → CPU fallback in release builds. debugPrint is
    debug-only — in a packaged app the user gets a CPU model and has no
    visible signal that NPU failed unless they explicitly read
    activeBackend. Could you switch the fallback log to
    developer.log(level: 900 /* WARNING */, name: 'flutter_gemma') so
    it survives release builds? And maybe note the silent-fallback
    semantics in the activeBackend dartdoc so callers know they're
    expected to check.

Smaller items, fine to fold into the same revision or leave for later:

  • The two _initializeFfiInferenceRuntime helpers in desktop and mobile
    are byte-identical except for the log tag — could live in
    core/ffi/backend_preference.dart with the tag as a parameter.
  • The firstError! bang at the end of each helper is logically safe
    (the order list is always non-empty) but a small if (firstError == null) throw StateError(...) would make that invariant explicit and
    survive future enum edits.

Sasha

@merlinnot
Copy link
Copy Markdown
Contributor Author

Thanks for the review, Sasha. I addressed the fallback loop feedback in the latest commits:

  • FFI backend initialization now uses a shared fallback helper for desktop/mobile.
  • Failed fallback chains now throw BackendInitException with inspectable per-backend attempts.
  • The helper catches Exception only, so programming/runtime Errors are not swallowed.
  • Fallback warnings now use developer.log(level: 900, name: 'flutter_gemma').
  • activeBackend docs now call out silent fallback behavior.
  • The new aggregate exception types are exported from the public package API.
  • Added focused tests for fallback order, successful fallback, aggregate failure details, public export coverage, and non-caught programming errors.

Validation:

  • flutter test --no-pub
  • targeted flutter analyze --no-pub
  • GitHub analyze-and-test is green

@DenisovAV
Copy link
Copy Markdown
Owner

Thanks Natan. CI green, all points addressed. Will fold into
release/0.16.2.

Sasha

@DenisovAV DenisovAV merged commit d56ce45 into DenisovAV:main May 26, 2026
3 checks passed
DenisovAV added a commit that referenced this pull request May 26, 2026
PR #288 added `activeBackend` as a required getter on `InferenceModel`.
The web LiteRT-LM inference path (LiteRtLmWebInferenceModel) was
introduced separately in release/0.16.2 and didn't pick up the new
member during the merge — implement it as null (same as
WebInferenceModel: the @litert-lm/core engine doesn't surface a final
backend).

Also document #288 in the 0.16.2 CHANGELOG entry.
DenisovAV added a commit that referenced this pull request May 29, 2026
Wire openSession() into FfiInferenceModel (#226). Each open session
owns its own LiteRtLmConversationHandle — independent KV cache,
history, and raw-response buffer.

- Extract ConversationHandle interface (litert_lm_client.dart) so the
  session depends on an abstraction the test layer can fake.
  LiteRtLmConversationHandle implements it.
- FfiInferenceModelSession now takes a ConversationHandle instead of
  the shared LiteRtLmFfiClient; routes chat/chatRaw/cancel/metrics/close
  through its own handle. Static extractTextFromResponse is unchanged.
- FfiInferenceModel: createSession() uses createConversationHandle()
  for the legacy singleton lane (overwrite + close old). New
  openSession() appends a detached session to _openSessions. sessions
  getter returns the union. close() cascade-closes every session in
  both lanes before engine shutdown.

Verification (host VM, no native engine needed):
- test/core/ffi/multi_session_test.dart — 6 tests using a fake
  ConversationHandle. Proves session-level isolation: two sessions with
  distinct handles produce distinct outputs ("I am A" vs "I am B");
  close() of one doesn't touch the other; Gemma 4 raw-response capture;
  StateError after close; stopGeneration routes to the handle.
- flutter analyze clean; full 387-test suite green.

The fake-handle seam mirrors PR #288's injectable-client pattern and
lets us verify multi-session orchestration without the native build
(macOS `flutter build` is blocked by a preexisting Xcode script-phase
cycle, tracked separately).
DenisovAV added a commit that referenced this pull request May 29, 2026
Wire openSession() into LiteRtLmWebInferenceModel (#226). Each open
session owns its own @litert-lm/core Conversation JS object —
independent KV cache + history; the upstream Engine.createConversation()
already supports multiple conversations per Engine.

- Extract _buildConversation() helper shared by createSession (legacy
  singleton, overwrites _session) and openSession (detached, appends to
  _openSessions). Keeps the sampler/preface/tools/thinking JS interop
  wiring in one place so the two lanes can't drift.
- Add _openSessions Set + sessions getter (union of legacy + open) +
  close() cascade-closes both lanes before engine.delete().
- Vision/audio remain force-disabled on both lanes (upstream
  @litert-lm/core@0.12.1 doesn't expose the executor setters).

Also fix a preexisting web-compile gap: litert_lm_client_stub.dart was
missing shutdown(), which flutter_gemma_mobile.dart references via the
PR #288 initializeFfiRuntime shutdownClient callback. The chrome web
test wasn't run after #288 so it stayed latent; add the stub method.

Verified: flutter analyze clean; web library compiles in Chrome
(test/web/litert_lm_web_test.dart); 393 host unit tests green.
DenisovAV added a commit that referenced this pull request May 29, 2026
…ng (#294)

Concurrent sessions (#226): openSession()/openChat() serve independent dialogues on one loaded model — .litertlm (FFI native + web) and .task (MediaPipe Android/iOS). Concurrent contexts, serialized inference; maxConcurrentSessions cap. Web .litertlm inference via @litert-lm/core (early preview, text-only). Backend reporting (#288), getActiveModel restore (#227), web function calling. Reviewed (5 CRITICAL concurrency/error-handling fixes) and verified on macOS 23/23 incl. edge cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants