feat: report initialized backend by merlinnot · Pull Request #288 · DenisovAV/flutter_gemma

merlinnot · 2026-05-21T20:09:13Z

Summary

expose InferenceModel.activeBackend so callers can inspect the backend actually initialized by the runtime
make FFI .litertlm initialization try backend fallback inside the plugin and return the backend that succeeded
share and test the FFI backend fallback order / LiteRT-LM wire names
report null for web and MediaPipe runtimes where the final initialized backend is not currently observable

Motivation

Callers can request a preferred backend, but the runtime may fall back. This gives applications a way to show users the compute backend actually in use without implementing their own fallback or probe loop.

Tests

test/core/ffi/backend_preference_test.dart covers NPU -> GPU -> CPU, GPU -> CPU, null -> GPU -> CPU, CPU-only fallback, and wire names
test/core/ffi/ffi_inference_model_test.dart covers that the FFI model exposes the initialized backend

Validation

flutter pub get
dart analyze lib/flutter_gemma_interface.dart lib/core/ffi/backend_preference.dart lib/core/ffi/ffi_inference_model.dart lib/core/ffi/ffi_inference_model_stub.dart lib/mobile/flutter_gemma_mobile_inference_model.dart lib/web/flutter_gemma_web.dart lib/mobile/flutter_gemma_mobile.dart lib/desktop/flutter_gemma_desktop.dart test/core/ffi/backend_preference_test.dart test/core/ffi/ffi_inference_model_test.dart
flutter test test/core/ffi/backend_preference_test.dart test/core/ffi/ffi_inference_model_test.dart test/core/litertlm_file_type_test.dart test/desktop_vision_params_test.dart --reporter=failures-only
git diff --check

Notes

flutter pub get updated local lockfile resolution for meta and test_api with this Flutter SDK, but those lockfile changes were intentionally not included in the commit.

DenisovAV · 2026-05-26T11:02:41Z

Thanks for this — the activeBackend getter + automatic fallback chain
is exactly the observability hook that's been missing. The pure-function
extraction in backend_preference.dart reads well, tests are well-scoped,
and the diff rebases cleanly on top of release/0.16.2.

Three things I'd like to address before merging, all in the fallback
loop in flutter_gemma_desktop.dart:387 / flutter_gemma_mobile.dart:692:

Throw the last error, not the first — or aggregate. Today
firstError ??= error means a user who requested NPU and effectively
ran on CPU sees the NPU error, not whatever made CPU fail (if it
did). When all backends fail, the last attempt is the configuration
they actually got — it's usually the most actionable. Even better:
a small BackendInitException(attempts: [...]) so the chain is
inspectable.
on Object catch is wider than intended. It traps OutOfMemoryError,
StackOverflowError, AssertionError etc., which are programming bugs
rather than backend-capability problems. Either on Exception catch,
or on Object catch (e) { if (e is OutOfMemoryError || e is StackOverflowError) rethrow; ... }.
Silent NPU → CPU fallback in release builds. debugPrint is
debug-only — in a packaged app the user gets a CPU model and has no
visible signal that NPU failed unless they explicitly read
activeBackend. Could you switch the fallback log to
developer.log(level: 900 /* WARNING */, name: 'flutter_gemma') so
it survives release builds? And maybe note the silent-fallback
semantics in the activeBackend dartdoc so callers know they're
expected to check.

Smaller items, fine to fold into the same revision or leave for later:

The two _initializeFfiInferenceRuntime helpers in desktop and mobile
are byte-identical except for the log tag — could live in
core/ffi/backend_preference.dart with the tag as a parameter.
The firstError! bang at the end of each helper is logically safe
(the order list is always non-empty) but a small if (firstError == null) throw StateError(...) would make that invariant explicit and
survive future enum edits.

Sasha

merlinnot · 2026-05-26T19:58:35Z

Thanks for the review, Sasha. I addressed the fallback loop feedback in the latest commits:

FFI backend initialization now uses a shared fallback helper for desktop/mobile.
Failed fallback chains now throw BackendInitException with inspectable per-backend attempts.
The helper catches Exception only, so programming/runtime Errors are not swallowed.
Fallback warnings now use developer.log(level: 900, name: 'flutter_gemma').
activeBackend docs now call out silent fallback behavior.
The new aggregate exception types are exported from the public package API.
Added focused tests for fallback order, successful fallback, aggregate failure details, public export coverage, and non-caught programming errors.

Validation:

flutter test --no-pub
targeted flutter analyze --no-pub
GitHub analyze-and-test is green

DenisovAV · 2026-05-26T21:25:25Z

Thanks Natan. CI green, all points addressed. Will fold into
release/0.16.2.

Sasha

PR #288 added `activeBackend` as a required getter on `InferenceModel`. The web LiteRT-LM inference path (LiteRtLmWebInferenceModel) was introduced separately in release/0.16.2 and didn't pick up the new member during the merge — implement it as null (same as WebInferenceModel: the @litert-lm/core engine doesn't surface a final backend). Also document #288 in the 0.16.2 CHANGELOG entry.

Wire openSession() into FfiInferenceModel (#226). Each open session owns its own LiteRtLmConversationHandle — independent KV cache, history, and raw-response buffer. - Extract ConversationHandle interface (litert_lm_client.dart) so the session depends on an abstraction the test layer can fake. LiteRtLmConversationHandle implements it. - FfiInferenceModelSession now takes a ConversationHandle instead of the shared LiteRtLmFfiClient; routes chat/chatRaw/cancel/metrics/close through its own handle. Static extractTextFromResponse is unchanged. - FfiInferenceModel: createSession() uses createConversationHandle() for the legacy singleton lane (overwrite + close old). New openSession() appends a detached session to _openSessions. sessions getter returns the union. close() cascade-closes every session in both lanes before engine shutdown. Verification (host VM, no native engine needed): - test/core/ffi/multi_session_test.dart — 6 tests using a fake ConversationHandle. Proves session-level isolation: two sessions with distinct handles produce distinct outputs ("I am A" vs "I am B"); close() of one doesn't touch the other; Gemma 4 raw-response capture; StateError after close; stopGeneration routes to the handle. - flutter analyze clean; full 387-test suite green. The fake-handle seam mirrors PR #288's injectable-client pattern and lets us verify multi-session orchestration without the native build (macOS `flutter build` is blocked by a preexisting Xcode script-phase cycle, tracked separately).

Wire openSession() into LiteRtLmWebInferenceModel (#226). Each open session owns its own @litert-lm/core Conversation JS object — independent KV cache + history; the upstream Engine.createConversation() already supports multiple conversations per Engine. - Extract _buildConversation() helper shared by createSession (legacy singleton, overwrites _session) and openSession (detached, appends to _openSessions). Keeps the sampler/preface/tools/thinking JS interop wiring in one place so the two lanes can't drift. - Add _openSessions Set + sessions getter (union of legacy + open) + close() cascade-closes both lanes before engine.delete(). - Vision/audio remain force-disabled on both lanes (upstream @litert-lm/core@0.12.1 doesn't expose the executor setters). Also fix a preexisting web-compile gap: litert_lm_client_stub.dart was missing shutdown(), which flutter_gemma_mobile.dart references via the PR #288 initializeFfiRuntime shutdownClient callback. The chrome web test wasn't run after #288 so it stayed latent; add the stub method. Verified: flutter analyze clean; web library compiles in Chrome (test/web/litert_lm_web_test.dart); 393 host unit tests green.

…ng (#294) Concurrent sessions (#226): openSession()/openChat() serve independent dialogues on one loaded model — .litertlm (FFI native + web) and .task (MediaPipe Android/iOS). Concurrent contexts, serialized inference; maxConcurrentSessions cap. Web .litertlm inference via @litert-lm/core (early preview, text-only). Backend reporting (#288), getActiveModel restore (#227), web function calling. Reviewed (5 CRITICAL concurrency/error-handling fixes) and verified on macOS 23/23 incl. edge cases.

merlinnot added 2 commits May 21, 2026 22:08

feat: report initialized backend

6f24d14

test: cover active backend reporting

72a1d6a

merlinnot marked this pull request as ready for review May 21, 2026 20:37

merlinnot added 3 commits May 26, 2026 19:46

Address backend fallback review

990b18c

Expose backend init failures

1c3dbbe

Harden backend init exception API

26e9334

DenisovAV merged commit d56ce45 into DenisovAV:main May 26, 2026
3 checks passed

DenisovAV mentioned this pull request May 29, 2026

0.16.2: concurrent sessions, web LiteRT-LM inference, backend reporting #294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: report initialized backend#288

feat: report initialized backend#288
DenisovAV merged 5 commits into
DenisovAV:mainfrom
merlinnot:codex/report-active-backend

merlinnot commented May 21, 2026 •

edited

Loading

Uh oh!

DenisovAV commented May 26, 2026

Uh oh!

merlinnot commented May 26, 2026

Uh oh!

DenisovAV commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

merlinnot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Tests

Validation

Notes

Uh oh!

DenisovAV commented May 26, 2026

Uh oh!

merlinnot commented May 26, 2026

Uh oh!

DenisovAV commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

merlinnot commented May 21, 2026 •

edited

Loading