Skip to content

feat(extension): add raw OpenAI-compatible stream path for thinking-mode providers#149

Merged
lcomplete merged 1 commit into
mainfrom
dev
Apr 27, 2026
Merged

feat(extension): add raw OpenAI-compatible stream path for thinking-mode providers#149
lcomplete merged 1 commit into
mainfrom
dev

Conversation

@lcomplete
Copy link
Copy Markdown
Owner

Summary

  • Adds requiresRawOpenAICompatibleStream flag to ProviderMeta; set for qwen, zhipu, and minimax providers
  • Introduces a fetch-based streamOpenAICompatibleChatCompletion helper that can pass arbitrary request-body extras (e.g. enable_thinking) not exposed by the Vercel AI SDK
  • Routes providers with the flag through the new raw path in background.ts; all others keep the existing Vercel AI SDK path
  • Adds getThinkingModeOptions helper in thinkingMode.ts to build provider-specific thinking flags
  • Fixes the loading indicator in AssistantMessage to remain visible while reasoning content is streaming (not just before any parts arrive)

Test plan

  • yarn test passes in app/extension
  • Manual smoke test: qwen / zhipu / minimax providers stream responses with thinking mode enabled
  • Other providers (OpenAI, Anthropic, Ollama) unaffected — still use Vercel AI SDK path
  • Loading dots visible during reasoning phase, hidden once final text is rendered

🤖 Generated with Claude Code

…ode providers

Introduce a direct fetch-based streaming pipeline for providers (qwen,
zhipu, minimax) that need explicit request-body flags such as
`enable_thinking`.  Also improves the loading indicator to stay visible
while reasoning output is in flight.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 27, 2026

🤖 Augment PR Summary

Summary: Adds a raw fetch-based OpenAI-compatible streaming path for providers that require explicit thinking-mode flags.

Changes:

  • Added requiresRawOpenAICompatibleStream to ProviderMeta and enabled it for qwen/zhipu/minimax.
  • Introduced usesRawOpenAICompatibleStream to decide when to bypass the Vercel AI SDK.
  • Implemented streamOpenAICompatibleChatCompletion to POST to /chat/completions and parse SSE deltas (content + reasoning).
  • Added getThinkingModeOptions to always send an explicit enable_thinking boolean.
  • Updated background.ts to route flagged providers through the raw stream and keep others on streamText.
  • Adjusted AssistantMessage so the loading indicator stays visible during reasoning/tool/step phases, and added Jest coverage.
Technical notes: The raw path caps output tokens at 8000 and extracts deltas from SSE data: frames (including reasoning_content/reasoning).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

const text = useMemo(() => getMessageText(message.parts), [message.parts]);
const lastResponseTextIndex = useMemo(
() => findLastResponseTextIndex(message),
[message]
Copy link
Copy Markdown

@augmentcode augmentcode Bot Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app/extension/src/sidepanel/components/AssistantMessage.tsx:56 lastResponseTextIndex is memoized with [message]; if message.parts changes without replacing the message object, the memo can go stale and showPreparingResponse may not reflect the latest streamed parts.
Consider depending on message.parts (or deriving directly) so the loading indicator stays in sync with streaming updates.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

const nextStreamState = applyStreamingPreviewChunk(streamState, chunk, {
includeReasoning: includeReasoningPreview,
streamState = nextStreamState;
sendStreamingPreviewUpdate(
Copy link
Copy Markdown

@augmentcode augmentcode Bot Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app/extension/src/background.ts:504 In the raw OpenAI-compatible path, sendStreamingPreviewUpdate isn’t wrapped in a try/catch like the Vercel AI path below, so a messaging failure could throw and abort the entire stream/task.
It may be safer to handle this error similarly so the stream stops gracefully instead of failing the run.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

dataLines = [];

const delta = extractOpenAICompatibleStreamDelta(eventData);
if (delta.done) {
Copy link
Copy Markdown

@augmentcode augmentcode Bot Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app/extension/src/ai/openAICompatibleStream.ts:140 OpenAICompatibleStreamDelta includes a done flag, but onDelta is never invoked with done: true (the function returns early when it sees [DONE]).
This can surprise callers that expect a terminal callback to finalize UI/state.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@lcomplete lcomplete merged commit 2217099 into main Apr 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant