Support /v1/responses for OpenAI models. #9795#9796
Support /v1/responses for OpenAI models. #9795#9796dpage wants to merge 3 commits intopgadmin-org:masterfrom
Conversation
WalkthroughOpenAIClient now supports both OpenAI Chat Completions and Responses APIs. It normalizes a base URL, selects the API via a Changes
Sequence DiagramsequenceDiagram
participant Client
participant OpenAIClient
participant ChatCompletionsAPI
participant ResponsesAPI
Client->>OpenAIClient: chat(messages, tools)
rect rgba(100,150,200,0.5)
Note over OpenAIClient: Attempt Chat Completions
OpenAIClient->>OpenAIClient: _build_chat_payload()
OpenAIClient->>ChatCompletionsAPI: POST /chat/completions
end
alt Chat Completions returns error (incompatible)
ChatCompletionsAPI-->>OpenAIClient: LLMClientError
rect rgba(200,150,100,0.5)
Note over OpenAIClient: Fallback to Responses API
OpenAIClient->>OpenAIClient: _build_responses_payload()
OpenAIClient->>ResponsesAPI: POST /responses
ResponsesAPI-->>OpenAIClient: Response / SSE events
OpenAIClient->>OpenAIClient: _parse_responses_response() / _read_responses_stream()
OpenAIClient-->>Client: LLMResponse
end
else Chat Completions succeeds
ChatCompletionsAPI-->>OpenAIClient: Response / SSE events
OpenAIClient->>OpenAIClient: _parse_response() / _read_openai_stream()
OpenAIClient-->>Client: LLMResponse
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsTimed out fetching pipeline failures after 30000ms Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (6)
web/pgadmin/llm/providers/openai.py (6)
166-170: Preserve exception chain withraise ... from e.Same issue as in
chat()- use exception chaining to preserve the original traceback.♻️ Proposed fix
except Exception as e: raise LLMClientError(LLMError( - message=f"Request failed: {str(e)}", + message=f"Request failed: {e!s}", provider=self.provider_name - )) + )) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 166 - 170, The except block that wraps exceptions into LLMClientError is losing the original traceback; update the handler to re-raise LLMClientError while preserving the exception chain by using "raise ... from e" when constructing the LLMClientError/LLMError (refer to the same except block that currently does "except Exception as e:" and uses LLMClientError and LLMError and self.provider_name); ensure the new raise statement includes "from e" so the original exception context is retained.
562-578: Consider extracting duplicated max_tokens error handling.This error handling block (lines 563-578) is nearly identical to lines 471-486 in
_parse_response(). Consider extracting to a shared helper method to reduce duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 562 - 578, Duplicate max_tokens truncation handling in _parse_response() and the block checking "if not content and not tool_calls" should be extracted into a single helper (e.g., _raise_max_tokens_error or _handle_max_tokens_truncation) that builds and raises the LLMClientError(LLMError(...)). Move construction of the message (using StopReason.MAX_TOKENS, usage.input_tokens, self._model, self.provider_name, code='max_tokens', retryable=False) into that helper and replace both inline blocks with a call to it, passing the usage object (or input_tokens) and any context needed.
608-612: Preserve exception chain withraise ... from e.Consistent with the other handlers, use exception chaining here.
♻️ Proposed fix
except Exception as e: raise LLMClientError(LLMError( - message=f"Streaming request failed: {str(e)}", + message=f"Streaming request failed: {e!s}", provider=self.provider_name - )) + )) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 608 - 612, The except block currently raises LLMClientError(LLMError(...)) without preserving the original exception; update the handler where the streaming request is caught (the block constructing LLMError with message f"Streaming request failed: {str(e)}" and provider self.provider_name) to re-raise using exception chaining (raise LLMClientError(LLMError(...)) from e) so the original traceback is preserved while keeping the same message and provider fields.
143-148: Preserve exception chain withraise ... from e.Catching broad
Exceptionhere loses the original traceback, making debugging harder. Use exception chaining to preserve context.♻️ Proposed fix
except Exception as e: raise LLMClientError(LLMError( - message=f"Request failed: {str(e)}", + message=f"Request failed: {e!s}", provider=self.provider_name - )) + )) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 143 - 148, In the except block in web/pgadmin/llm/providers/openai.py where you catch Exception as e and raise LLMClientError(LLMError(...)) (referencing LLMClientError, LLMError and self.provider_name), preserve the original traceback by chaining the exception—rethrow the LLMClientError using "raise ... from e" so the original exception e is attached to the new error for proper debugging context.
172-177: String-based error detection is fragile but pragmatic.The detection relies on error message substrings which could break if OpenAI changes their error wording. Consider additionally checking
error.codeif OpenAI provides structured error codes for this case in the future.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 172 - 177, The current _should_use_responses_api relies only on string-matching the error message; update it to also inspect a structured error code when available by reading getattr(error, "code", None) (or error.code) and returning True for known codes indicating model/endpoint incompatibility (e.g., model-not-supported/unsupported_model or whatever OpenAI uses), while retaining the existing substring checks on str(error) for backward compatibility; reference the _should_use_responses_api function and the error parameter when making this change.
634-638: Also preserve exception chain here.This existing exception handler should also use
raise ... from efor consistency.♻️ Proposed fix
except Exception as e: raise LLMClientError(LLMError( - message=f"Streaming request failed: {str(e)}", + message=f"Streaming request failed: {e!s}", provider=self.provider_name - )) + )) from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 634 - 638, In the except block that catches Exception e around the streaming request, preserve the exception chain by re-raising the LLMClientError from the original exception (use "raise LLMClientError(... ) from e"); update the raise that constructs LLMClientError(LLMError(..., provider=self.provider_name)) so it uses "from e" to maintain the original traceback and aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@web/pgadmin/llm/providers/openai.py`:
- Around line 166-170: The except block that wraps exceptions into
LLMClientError is losing the original traceback; update the handler to re-raise
LLMClientError while preserving the exception chain by using "raise ... from e"
when constructing the LLMClientError/LLMError (refer to the same except block
that currently does "except Exception as e:" and uses LLMClientError and
LLMError and self.provider_name); ensure the new raise statement includes "from
e" so the original exception context is retained.
- Around line 562-578: Duplicate max_tokens truncation handling in
_parse_response() and the block checking "if not content and not tool_calls"
should be extracted into a single helper (e.g., _raise_max_tokens_error or
_handle_max_tokens_truncation) that builds and raises the
LLMClientError(LLMError(...)). Move construction of the message (using
StopReason.MAX_TOKENS, usage.input_tokens, self._model, self.provider_name,
code='max_tokens', retryable=False) into that helper and replace both inline
blocks with a call to it, passing the usage object (or input_tokens) and any
context needed.
- Around line 608-612: The except block currently raises
LLMClientError(LLMError(...)) without preserving the original exception; update
the handler where the streaming request is caught (the block constructing
LLMError with message f"Streaming request failed: {str(e)}" and provider
self.provider_name) to re-raise using exception chaining (raise
LLMClientError(LLMError(...)) from e) so the original traceback is preserved
while keeping the same message and provider fields.
- Around line 143-148: In the except block in
web/pgadmin/llm/providers/openai.py where you catch Exception as e and raise
LLMClientError(LLMError(...)) (referencing LLMClientError, LLMError and
self.provider_name), preserve the original traceback by chaining the
exception—rethrow the LLMClientError using "raise ... from e" so the original
exception e is attached to the new error for proper debugging context.
- Around line 172-177: The current _should_use_responses_api relies only on
string-matching the error message; update it to also inspect a structured error
code when available by reading getattr(error, "code", None) (or error.code) and
returning True for known codes indicating model/endpoint incompatibility (e.g.,
model-not-supported/unsupported_model or whatever OpenAI uses), while retaining
the existing substring checks on str(error) for backward compatibility;
reference the _should_use_responses_api function and the error parameter when
making this change.
- Around line 634-638: In the except block that catches Exception e around the
streaming request, preserve the exception chain by re-raising the LLMClientError
from the original exception (use "raise LLMClientError(... ) from e"); update
the raise that constructs LLMClientError(LLMError(...,
provider=self.provider_name)) so it uses "from e" to maintain the original
traceback and aid debugging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9660b0cd-9bd8-4822-b6a9-fd3dfa3ccb25
📒 Files selected for processing (1)
web/pgadmin/llm/providers/openai.py
- Preserve exception chains with 'raise ... from e' in all exception handlers for better debugging tracebacks. - Use f-string !s conversion instead of str() calls. - Extract duplicated max_tokens error handling into a shared _raise_max_tokens_error() helper method. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@web/pgadmin/llm/providers/openai.py`:
- Around line 546-568: The code currently infers stop reasons only from
tool_calls or a coarse status == 'incomplete'; update both
_parse_responses_response and _read_responses_stream to inspect the final
response's status and, when status == 'incomplete', consult
data.get('incomplete_details', {}).get('reason') to choose the proper StopReason
(e.g., map 'max_tokens' -> StopReason.MAX_TOKENS, 'content_filter' ->
StopReason.CONTENT_FILTER, others -> StopReason.UNKNOWN); use that stop_reason
when deciding to call self._raise_max_tokens_error(usage.input_tokens) (only for
max_tokens) and to classify responses even if content or tool_calls were emitted
so incomplete statuses are not ignored. Ensure you reference the same variables
used now (status, data, usage, stop_reason) and update the StopReason assignment
logic in both parsing paths.
- Around line 71-79: Validate and normalize the provided api_url inside
__init__: ensure api_url (or DEFAULT_API_BASE_URL) is treated as a base URL by
stripping any known endpoint suffixes like '/chat/completions' or '/responses'
(or their trailing variants) before assigning to self._base_url, or raise a
ValueError if the value appears to already be a full endpoint; this prevents
_api_url from producing duplicated paths when _use_responses_api toggles between
'/responses' and '/chat/completions'.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2c38230f-6ce8-46f4-8c68-c2ac6d65af02
📒 Files selected for processing (1)
web/pgadmin/llm/providers/openai.py
web/pgadmin/llm/providers/openai.py
Outdated
| # Determine stop reason | ||
| status = data.get('status', '') | ||
| if tool_calls: | ||
| stop_reason = StopReason.TOOL_USE | ||
| elif status == 'completed': | ||
| stop_reason = StopReason.END_TURN | ||
| elif status == 'incomplete': | ||
| stop_reason = StopReason.MAX_TOKENS | ||
| else: | ||
| stop_reason = StopReason.UNKNOWN | ||
|
|
||
| # Parse usage information | ||
| usage_data = data.get('usage', {}) | ||
| usage = Usage( | ||
| input_tokens=usage_data.get('input_tokens', 0), | ||
| output_tokens=usage_data.get('output_tokens', 0), | ||
| total_tokens=usage_data.get('total_tokens', 0) | ||
| ) | ||
|
|
||
| # Check for problematic responses | ||
| if not content and not tool_calls: | ||
| if stop_reason == StopReason.MAX_TOKENS: | ||
| self._raise_max_tokens_error(usage.input_tokens) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file structure and get file size
wc -l web/pgadmin/llm/providers/openai.pyRepository: pgadmin-org/pgadmin4
Length of output: 103
🏁 Script executed:
# Read the _parse_responses_response method (lines 546-568)
sed -n '540,580p' web/pgadmin/llm/providers/openai.py | cat -nRepository: pgadmin-org/pgadmin4
Length of output: 1708
🏁 Script executed:
# Read the _read_responses_stream method (lines 848-919)
sed -n '840,930p' web/pgadmin/llm/providers/openai.py | cat -nRepository: pgadmin-org/pgadmin4
Length of output: 3464
🏁 Script executed:
# Search for incomplete_details usage in the file
rg 'incomplete_details' web/pgadmin/llm/providers/openai.pyRepository: pgadmin-org/pgadmin4
Length of output: 46
🏁 Script executed:
# Search for StopReason enum definition
rg -A 10 'class StopReason\|enum StopReason' web/pgadmin/llm/providers/Repository: pgadmin-org/pgadmin4
Length of output: 46
🏁 Script executed:
# Check for any existing handling of incomplete_details or status logic
rg 'status.*incomplete|incomplete.*reason' web/pgadmin/llm/providers/openai.pyRepository: pgadmin-org/pgadmin4
Length of output: 100
🏁 Script executed:
# Look for the response.completed event handling and response structure
sed -n '836,920p' web/pgadmin/llm/providers/openai.py | cat -nRepository: pgadmin-org/pgadmin4
Length of output: 3637
🏁 Script executed:
# Search for any mentions of failed or error in the streaming context
rg 'failed|error|refusal' web/pgadmin/llm/providers/openai.py | head -20Repository: pgadmin-org/pgadmin4
Length of output: 1139
🏁 Script executed:
# Check imports and class definitions for context
head -50 web/pgadmin/llm/providers/openai.py | cat -nRepository: pgadmin-org/pgadmin4
Length of output: 1759
🏁 Script executed:
# Look for the full _read_responses_stream method signature and context
sed -n '820,850p' web/pgadmin/llm/providers/openai.py | cat -nRepository: pgadmin-org/pgadmin4
Length of output: 1168
🏁 Script executed:
# Search for StopReason enum to see all available values
rg 'StopReason' web/pgadmin/llm/providers/openai.py -B 2 -A 15Repository: pgadmin-org/pgadmin4
Length of output: 4469
🏁 Script executed:
# Search for any references to incomplete_details or response structure comments
rg 'response\.|incomplete|refus' web/pgadmin/llm/providers/openai.py -B 2 -A 2 | head -50Repository: pgadmin-org/pgadmin4
Length of output: 1597
🏁 Script executed:
# Look at the full _parse_responses_response method to understand context better
sed -n '500,575p' web/pgadmin/llm/providers/openai.py | cat -nRepository: pgadmin-org/pgadmin4
Length of output: 3232
🏁 Script executed:
# Check if there are any TODO or FIXME comments related to this
rg 'TODO|FIXME' web/pgadmin/llm/providers/openai.pyRepository: pgadmin-org/pgadmin4
Length of output: 46
🏁 Script executed:
# Check what the response.completed event structure looks like in the streaming method
sed -n '880,920p' web/pgadmin/llm/providers/openai.py | cat -nRepository: pgadmin-org/pgadmin4
Length of output: 1696
🌐 Web query:
OpenAI Responses API incomplete_details reason documentation
💡 Result:
In the OpenAI Responses API, the incomplete_details field in the Response object provides details about why a response is incomplete when the overall status is "incomplete". It is an object containing a reason field. The documented possible values for incomplete_details.reason are: - "max_output_tokens": The response was truncated because the max_output_tokens limit was reached. - "content_filter": The response was stopped due to content filtering (e.g., safety violations). This is from the official API reference for retrieving a model response (e.g., GET /responses/{response_id}). Example usage from community reports: When max_output_tokens is hit in batch processing or with reasoning models like GPT-5, the response may have status "incomplete" with incomplete_details: {"reason": "max_output_tokens"}. Some users report bugs where incomplete_details is null even when expected. Note: This differs from older APIs like Chat Completions (finish_reason: "length") or Assistants (run.incomplete_details.reason: "max_prompt_tokens"). The Responses API is newer, recommended for reasoning models like GPT-5 series.
Citations:
- 1: https://developers.openai.com/api/reference/resources/responses/
- 2: https://developers.openai.com/api/docs/api-reference/responses/get
- 3: https://developers.openai.com/api/docs/api-reference/responses/object
- 4: https://developers.openai.com/api/reference/python/resources/responses/
- 5: https://developers.openai.com/api/reference/resources/responses/methods/retrieve
🌐 Web query:
OpenAI Responses API status incomplete_details field
💡 Result:
The incomplete_details field in the OpenAI Responses API is an optional object present in the Response object when the response status is "incomplete". It provides details on why the response generation stopped prematurely. Structure: - reason: string indicating the specific cause, such as "max_output_tokens" (when the max_output_tokens limit is reached, including during reasoning) or "content_filter" (when content is filtered). Example usage for handling: if (response.status === "incomplete" && response.incomplete_details?.reason === "max_output_tokens") { console.log("Ran out of tokens"); if (response.output_text?.length > 0) { console.log("Partial output:", response.output_text); } else { console.log("Ran out of tokens during reasoning"); } } Note: There are reported bugs where incomplete_details may be null even when status is "completed" but output is truncated (e.g., with GPT-5 models).
Citations:
- 1: https://developers.openai.com/api/docs/guides/reasoning/
- 2: https://developers.openai.com/api/reference/python/resources/responses/
- 3: https://developers.openai.com/api/reference/resources/responses/methods/retrieve
- 4: https://community.openai.com/t/empty-incomplete-details-from-gpt-5-with-max-output-tokens/1355725
Use the final response status and incomplete_details.reason in both parsers.
The Responses API provides incomplete_details.reason to distinguish between incomplete statuses. Currently, _parse_responses_response() treats all status == 'incomplete' as MAX_TOKENS, and _read_responses_stream() ignores the final response status entirely, inferring success only from emitted text or tool calls. This can misclassify actual content_filter refusals or other incomplete reasons as token-limit errors, and allows successful-looking responses with incomplete status to bypass error handling. Both code paths should check the final response status and use incomplete_details.reason (when present) to set the correct StopReason.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@web/pgadmin/llm/providers/openai.py` around lines 546 - 568, The code
currently infers stop reasons only from tool_calls or a coarse status ==
'incomplete'; update both _parse_responses_response and _read_responses_stream
to inspect the final response's status and, when status == 'incomplete', consult
data.get('incomplete_details', {}).get('reason') to choose the proper StopReason
(e.g., map 'max_tokens' -> StopReason.MAX_TOKENS, 'content_filter' ->
StopReason.CONTENT_FILTER, others -> StopReason.UNKNOWN); use that stop_reason
when deciding to call self._raise_max_tokens_error(usage.input_tokens) (only for
max_tokens) and to classify responses even if content or tool_calls were emitted
so incomplete statuses are not ignored. Ensure you reference the same variables
used now (status, data, usage, stop_reason) and update the StopReason assignment
logic in both parsing paths.
- Strip known endpoint suffixes (/chat/completions, /responses) from api_url in __init__ to prevent doubled paths if a user provides a full endpoint URL instead of a base URL. - Use incomplete_details.reason from the Responses API to properly distinguish between max_output_tokens and content_filter when the response status is 'incomplete', in both the non-streaming and streaming parsers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (4)
web/pgadmin/llm/providers/openai.py (4)
78-78: Thread safety consideration for shared client instances.The
_use_responses_apiflag is modified duringchat()andchat_stream()calls. If the sameOpenAIClientinstance is shared across concurrent requests in a multi-threaded environment, this could cause race conditions where one thread's API selection affects another.If thread safety is needed, consider:
- Making the flag thread-local
- Passing it as a method parameter
- Documenting that clients should be per-request
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` at line 78, The _use_responses_api boolean on OpenAIClient is mutated inside chat() and chat_stream(), which creates a race condition if an OpenAIClient instance is reused across threads; make the selection thread-safe by removing shared mutable state: either convert _use_responses_api to thread-local storage (e.g., use threading.local within OpenAIClient), or stop mutating it and instead pass the selection as a parameter to chat()/chat_stream() (or compute it locally inside those methods), and update all call sites accordingly; reference _use_responses_api, chat(), chat_stream(), and OpenAIClient in your changes.
178-183: Error message matching is fragile but acceptable.The string-based detection in
_should_use_responses_api()relies on specific OpenAI error message text. If OpenAI changes error wording, the fallback may not trigger. Consider documenting this behavior or adding alternative detection methods in the future.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 178 - 183, The current _should_use_responses_api(error: LLMClientError) implementation relies on fragile string matching of OpenAI error messages; update the function by adding a clear docstring/comment documenting this fragility and the specific substrings being checked (e.g., 'v1/responses', 'not supported in the v1/chat/completions', 'not a chat model'), and add a TODO to implement more robust detection later (e.g., checking error codes/structured fields if available); keep the existing logic but annotate the function (and possibly add unit tests referencing _should_use_responses_api) so future maintainers know why it exists and where to improve it.
940-945: Inconsistent error handling between streaming and non-streaming paths.The non-streaming
_parse_responses_response()(lines 579-589) checksstop_reasonbefore raising an error, calling_raise_max_tokens_error()forMAX_TOKENSand raising a content filter error forSTOP_SEQUENCE. The streaming path here raises a generic error regardless ofstop_reason.For consistency, consider aligning the error handling:
♻️ Proposed fix for consistent error handling
if not content and not tool_calls: + if stop_reason == StopReason.MAX_TOKENS: + self._raise_max_tokens_error(usage.input_tokens) + elif stop_reason == StopReason.STOP_SEQUENCE: + raise LLMClientError(LLMError( + message='Response blocked by content filter.', + code='content_filter', + provider=self.provider_name, + retryable=False + )) raise LLMClientError(LLMError( message='No response content returned from API', provider=self.provider_name, retryable=False ))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 940 - 945, The streaming path currently raises a generic LLMClientError when both content and tool_calls are empty; instead, mirror the non-streaming logic in _parse_responses_response(): inspect the stop_reason from the response in the streaming handler (the function that contains this content/tool_calls check), call _raise_max_tokens_error() when stop_reason == MAX_TOKENS, and raise the content filter error when stop_reason == STOP_SEQUENCE, otherwise raise the existing generic 'No response content returned from API' error; reference and reuse _raise_max_tokens_error() and the same error construction used in _parse_responses_response() to keep behavior consistent across streaming and non-streaming paths.
817-822: Same inconsistency in Chat Completions streaming path.The
_read_openai_stream()method also raises a generic error without checkingstop_reason. Apply the same fix pattern here for consistency with_parse_response()(lines 492-503).♻️ Proposed fix
if not content and not tool_calls: + if stop_reason == StopReason.MAX_TOKENS: + self._raise_max_tokens_error(usage.input_tokens) raise LLMClientError(LLMError( message='No response content returned from API', provider=self.provider_name, retryable=False ))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@web/pgadmin/llm/providers/openai.py` around lines 817 - 822, In _read_openai_stream(), when you hit the branch that currently raises LLMClientError for "No response content returned from API" (the case with no content and no tool_calls), inspect the stop_reason value and set the LLMError.retryable flag the same way as in _parse_response(): treat stop_reason == 'length' as retryable=True and all other stop reasons as retryable=False; include the provider (self.provider_name) and the stop_reason in the error message so callers get the same behavior and diagnostics as _parse_response().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@web/pgadmin/llm/providers/openai.py`:
- Line 78: The _use_responses_api boolean on OpenAIClient is mutated inside
chat() and chat_stream(), which creates a race condition if an OpenAIClient
instance is reused across threads; make the selection thread-safe by removing
shared mutable state: either convert _use_responses_api to thread-local storage
(e.g., use threading.local within OpenAIClient), or stop mutating it and instead
pass the selection as a parameter to chat()/chat_stream() (or compute it locally
inside those methods), and update all call sites accordingly; reference
_use_responses_api, chat(), chat_stream(), and OpenAIClient in your changes.
- Around line 178-183: The current _should_use_responses_api(error:
LLMClientError) implementation relies on fragile string matching of OpenAI error
messages; update the function by adding a clear docstring/comment documenting
this fragility and the specific substrings being checked (e.g., 'v1/responses',
'not supported in the v1/chat/completions', 'not a chat model'), and add a TODO
to implement more robust detection later (e.g., checking error codes/structured
fields if available); keep the existing logic but annotate the function (and
possibly add unit tests referencing _should_use_responses_api) so future
maintainers know why it exists and where to improve it.
- Around line 940-945: The streaming path currently raises a generic
LLMClientError when both content and tool_calls are empty; instead, mirror the
non-streaming logic in _parse_responses_response(): inspect the stop_reason from
the response in the streaming handler (the function that contains this
content/tool_calls check), call _raise_max_tokens_error() when stop_reason ==
MAX_TOKENS, and raise the content filter error when stop_reason ==
STOP_SEQUENCE, otherwise raise the existing generic 'No response content
returned from API' error; reference and reuse _raise_max_tokens_error() and the
same error construction used in _parse_responses_response() to keep behavior
consistent across streaming and non-streaming paths.
- Around line 817-822: In _read_openai_stream(), when you hit the branch that
currently raises LLMClientError for "No response content returned from API" (the
case with no content and no tool_calls), inspect the stop_reason value and set
the LLMError.retryable flag the same way as in _parse_response(): treat
stop_reason == 'length' as retryable=True and all other stop reasons as
retryable=False; include the provider (self.provider_name) and the stop_reason
in the error message so callers get the same behavior and diagnostics as
_parse_response().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1a908a3c-734e-4b5d-9b66-0cb1fc81ea9e
📒 Files selected for processing (1)
web/pgadmin/llm/providers/openai.py
#9795
Summary by CodeRabbit
New Features
Refactor
Bug Fixes