feat(js/plugins/compat-oai): add OpenAI Responses API support#5237
feat(js/plugins/compat-oai): add OpenAI Responses API support#5237DenisovAV wants to merge 4 commits intogenkit-ai:mainfrom
Conversation
Adds an opt-in `openAIResponses()` companion plugin and an
`openAI.responsesModel(...)` helper that target the OpenAI Responses
API (POST /v1/responses). Built-in tools (web_search_preview,
file_search, code_interpreter), reasoning summaries, and stateful
conversations via previous_response_id are supported. Citations from
built-in tools surface as `metadata.citations` on text Parts.
The new plugin is a sibling of the existing `openAI()` plugin and
mirrors the deepseek/xai pattern so other compat providers and the
existing Chat Completions path are unaffected. A regression test
verifies that constructing the deepseek/xai plugins does not load any
of the new responses/* files.
Files:
- js/plugins/compat-oai/src/openai/responses/{types,request,response,
runner,stream,index}.ts — implementation (~1.4 KLOC)
- js/plugins/compat-oai/src/openai/index.ts — `openAIResponses()`
factory + `openAI.responsesModel(...)` helper
- js/plugins/compat-oai/tests/openai_responses_test.ts (28 cases)
- js/plugins/compat-oai/tests/openai_responses_stream_test.ts
(10 cases) — SSE event aggregator
- js/plugins/compat-oai/tests/compat_oai_isolation_test.ts — checks
responses/* are NOT loaded by deepseek/xai
- js/plugins/compat-oai/scripts/smoke_responses.ts — manual live-API
smoke (gated on OPENAI_API_KEY) covering plain text, web search
citations, streaming, and previousResponseId chaining
- README — new "Using the OpenAI Responses API" section
Discussion: genkit-ai#5236
There was a problem hiding this comment.
Code Review
This pull request adds support for the OpenAI Responses API (/v1/responses) to the compat-oai plugin, introducing the openAIResponses() companion plugin and responsesModel() helper. These changes enable the use of built-in tools, reasoning models, and stateful conversation chaining. The review identified a critical configuration issue where systemRole must be set to true for reasoning models to ensure system messages reach the plugin's custom lifting logic. Additionally, a fix was suggested to handle undefined tool outputs during serialization to prevent invalid API requests.
| media: true, | ||
| // o1/o3/gpt-5 reasoning family ignores `system` role; instructions | ||
| // must go via the `instructions` config field instead. | ||
| systemRole: false, |
There was a problem hiding this comment.
Setting 'systemRole: false' here will cause Genkit's core 'generate' logic to perform role mapping (e.g., converting 'system' messages to 'user' messages with a prefix) before the request reaches this plugin. This effectively bypasses the custom lifting logic implemented in 'toResponsesRequestBody' (which specifically looks for 'role === 'system''). To ensure your custom lifting to the 'instructions' field works as intended for reasoning models, you should set 'systemRole: true' here so the plugin receives the original system messages.
| systemRole: false, | |
| systemRole: true, |
There was a problem hiding this comment.
Applied the change in 5f6077f + e912611 — systemRole: true is the right value.
Small correction on the mechanism though: I checked Genkit core (js/ai/src/) and systemRole is purely declarative metadata — core does not auto-convert system → user based on it. simulateSystemPrompt is opt-in middleware, not core behaviour. The end result is the same (the plugin lifts text-only system messages into instructions itself before the request leaves) so systemRole: true is the accurate self-description from a Genkit consumer's POV. Comment in types.ts updated to reflect the actual mechanism, and there's a new e2e test that exercises the lift against o3.
| typeof part.toolResponse.output === 'string' | ||
| ? part.toolResponse.output | ||
| : JSON.stringify(part.toolResponse.output), | ||
| }); |
There was a problem hiding this comment.
If 'part.toolResponse.output' is 'undefined', 'JSON.stringify(undefined)' will return 'undefined', which results in the 'output' property being omitted from the 'function_call_output' item. Since the OpenAI Responses API expects a string for the tool output, it is safer to provide a fallback value (like an empty JSON object string) to ensure the request remains valid.
output:
typeof part.toolResponse.output === 'string'
? part.toolResponse.output
: JSON.stringify(part.toolResponse.output ?? {}),There was a problem hiding this comment.
Done in 5f6077f — output: JSON.stringify(part.toolResponse.output ?? {}). Added a unit test asserting output: '{}' for the undefined case.
Two issues from automated review on PR genkit-ai#5237: 1. Reasoning models (o1/o3/o4-mini/gpt-5*) now advertise `systemRole: true`. The plugin lifts text-only system messages into the top-level `instructions` field itself; advertising `systemRole: false` would cause Genkit core to convert system → user messages BEFORE the request reaches our resolver, defeating the lift. Added a unit test asserting `o3` and `o4-mini` carry `supports.systemRole === true`. 2. Tool messages with undefined `toolResponse.output` no longer drop the `output` field (which would happen because `JSON.stringify(undefined) === undefined`). They now emit `output: '{}'` so the API request stays valid. Added a unit test.
…g-model lift e2e test The reasoning behind `systemRole: true` in REASONING_MODEL_INFO was mis-stated in the previous commit. Genkit core does not auto-transform messages based on `systemRole`; the value is purely declarative metadata for plugin consumers. The change is still correct (the plugin handles system→instructions lift internally so the model effectively does support system role from a Genkit POV), but the inline comment now describes the actual mechanism instead of a non-existent core transformation. Also updates the stale JSDoc on `toResponsesRequestBody` which still referenced `supports.systemRole === false` as the lift trigger; the real trigger is `config.instructions == null`, applied for all Responses-namespaced models. Adds an e2e unit test that exercises the lift end-to-end against a reasoning model id (`o3`) and asserts both `instructions` is populated and the input array carries no `system` role.
|
Thanks @DenisovAV! Are you working with anybody on the core team right now? If not, let me see who can help review and get this merged. |
…t:check) The previous commits ran prettier from the plugin's own node_modules which slightly disagreed with the monorepo-root prettier 3.5.3 config on a couple of import-line breaks. Re-applies the canonical monorepo formatting to satisfy CI's `pnpm format:check`.
Summary
Adds opt-in OpenAI Responses API (
/v1/responses) support to@genkit-ai/compat-oaivia a siblingopenAIResponses()plugin and anopenAI.responsesModel(...)helper. Closes the gap discussed in #5236 (and previous #3640 / #4687 / #3574).This is filed as draft for design feedback before final review — see #5236 for the open question on sibling plugin vs prefix dispatch.
What's covered
client.responses.createclient.responses.streamwith a per-output_indexSSE event aggregator (text deltas, reasoning summary deltas, function-call argument aggregation, built-in tool lifecycle progress, citation buffering)web_search_preview,file_search,code_interpreterinstructions; non-text system content stays in inputconfig.previousResponseId→previous_response_id; current turn's id surfaces onresponse.custom.responseIdmetadata.citationson text Parts. DiscriminatedCitationtype —url_citation | file_citation— forward-compatible if Genkit ever adds a first-class citation Part.store: false(opposite of OpenAI's defaulttrue); documented in README and overridable per-call.APIErrormapped toGenkitError(status mirroring), non-API errors wrapped asINTERNAL(orCANCELLEDon abort), stream-levelerrorevents captured and re-thrown rather than silently truncating.metadata.malformedArguments: true, pluslogger.warn(no silent corruption).Isolation
src/model.tsis not modified.src/deepseek/,src/xai/) are not modified.tests/compat_oai_isolation_test.ts) verifies that constructing the deepseek/xai plugins does not load anyresponses/*files.Tests
pnpm checkclean (tsc --noEmit)pnpm buildclean (tsup DTS + JS)gpt-5-mini(scripts/smoke_responses.ts, gated onOPENAI_API_KEY): plain text, web-search citations, streaming,previousResponseIdchaining — all greenFiles
src/openai/responses/{types,request,response,runner,stream,index}.tstests/openai_responses_test.tstests/openai_responses_stream_test.tstests/compat_oai_isolation_test.tsscripts/smoke_responses.tssrc/openai/index.ts—openAIResponses()factory,openAI.responsesModel(...)helper, type overloadsREADME.md— new "Using the OpenAI Responses API" sectionopenai4.x exposesclient.responses.*already.Open questions (from #5236)
openAI()'s resolver if preferred — moves namespace registration boilerplate but keeps user surface (openAI.responsesModel(...)) identical.metadata.citationsvs first-class Part type.metadata.citationskeeps Genkit core untouched and is forward-compatible. Open to filing a separate core RFC for first-class citations as a follow-up.store: falsedefault. Opposite of OpenAI's default. Privacy-by-default for plugin users; documented. Happy to flip if maintainers prefer parity with OpenAI.Test plan
pnpm test --filter @genkit-ai/compat-oaipnpm check --filter @genkit-ai/compat-oaipnpm build --filter @genkit-ai/compat-oaigpt-5-miniresponses/*