feat: report initialized backend#288
Conversation
|
Thanks for this — the activeBackend getter + automatic fallback chain Three things I'd like to address before merging, all in the fallback
Smaller items, fine to fold into the same revision or leave for later:
Sasha |
|
Thanks for the review, Sasha. I addressed the fallback loop feedback in the latest commits:
Validation:
|
|
Thanks Natan. CI green, all points addressed. Will fold into Sasha |
PR #288 added `activeBackend` as a required getter on `InferenceModel`. The web LiteRT-LM inference path (LiteRtLmWebInferenceModel) was introduced separately in release/0.16.2 and didn't pick up the new member during the merge — implement it as null (same as WebInferenceModel: the @litert-lm/core engine doesn't surface a final backend). Also document #288 in the 0.16.2 CHANGELOG entry.
Wire openSession() into FfiInferenceModel (#226). Each open session owns its own LiteRtLmConversationHandle — independent KV cache, history, and raw-response buffer. - Extract ConversationHandle interface (litert_lm_client.dart) so the session depends on an abstraction the test layer can fake. LiteRtLmConversationHandle implements it. - FfiInferenceModelSession now takes a ConversationHandle instead of the shared LiteRtLmFfiClient; routes chat/chatRaw/cancel/metrics/close through its own handle. Static extractTextFromResponse is unchanged. - FfiInferenceModel: createSession() uses createConversationHandle() for the legacy singleton lane (overwrite + close old). New openSession() appends a detached session to _openSessions. sessions getter returns the union. close() cascade-closes every session in both lanes before engine shutdown. Verification (host VM, no native engine needed): - test/core/ffi/multi_session_test.dart — 6 tests using a fake ConversationHandle. Proves session-level isolation: two sessions with distinct handles produce distinct outputs ("I am A" vs "I am B"); close() of one doesn't touch the other; Gemma 4 raw-response capture; StateError after close; stopGeneration routes to the handle. - flutter analyze clean; full 387-test suite green. The fake-handle seam mirrors PR #288's injectable-client pattern and lets us verify multi-session orchestration without the native build (macOS `flutter build` is blocked by a preexisting Xcode script-phase cycle, tracked separately).
Wire openSession() into LiteRtLmWebInferenceModel (#226). Each open session owns its own @litert-lm/core Conversation JS object — independent KV cache + history; the upstream Engine.createConversation() already supports multiple conversations per Engine. - Extract _buildConversation() helper shared by createSession (legacy singleton, overwrites _session) and openSession (detached, appends to _openSessions). Keeps the sampler/preface/tools/thinking JS interop wiring in one place so the two lanes can't drift. - Add _openSessions Set + sessions getter (union of legacy + open) + close() cascade-closes both lanes before engine.delete(). - Vision/audio remain force-disabled on both lanes (upstream @litert-lm/core@0.12.1 doesn't expose the executor setters). Also fix a preexisting web-compile gap: litert_lm_client_stub.dart was missing shutdown(), which flutter_gemma_mobile.dart references via the PR #288 initializeFfiRuntime shutdownClient callback. The chrome web test wasn't run after #288 so it stayed latent; add the stub method. Verified: flutter analyze clean; web library compiles in Chrome (test/web/litert_lm_web_test.dart); 393 host unit tests green.
…ng (#294) Concurrent sessions (#226): openSession()/openChat() serve independent dialogues on one loaded model — .litertlm (FFI native + web) and .task (MediaPipe Android/iOS). Concurrent contexts, serialized inference; maxConcurrentSessions cap. Web .litertlm inference via @litert-lm/core (early preview, text-only). Backend reporting (#288), getActiveModel restore (#227), web function calling. Reviewed (5 CRITICAL concurrency/error-handling fixes) and verified on macOS 23/23 incl. edge cases.
Summary
InferenceModel.activeBackendso callers can inspect the backend actually initialized by the runtime.litertlminitialization try backend fallback inside the plugin and return the backend that succeededMotivation
Callers can request a preferred backend, but the runtime may fall back. This gives applications a way to show users the compute backend actually in use without implementing their own fallback or probe loop.
Tests
test/core/ffi/backend_preference_test.dartcovers NPU -> GPU -> CPU, GPU -> CPU, null -> GPU -> CPU, CPU-only fallback, and wire namestest/core/ffi/ffi_inference_model_test.dartcovers that the FFI model exposes the initialized backendValidation
Notes
flutter pub getupdated local lockfile resolution formetaandtest_apiwith this Flutter SDK, but those lockfile changes were intentionally not included in the commit.