Skip to content

Support /v1/responses for OpenAI models. #9795#9796

Open
dpage wants to merge 3 commits intopgadmin-org:masterfrom
dpage:openai-api-compat
Open

Support /v1/responses for OpenAI models. #9795#9796
dpage wants to merge 3 commits intopgadmin-org:masterfrom
dpage:openai-api-compat

Conversation

@dpage
Copy link
Copy Markdown
Contributor

@dpage dpage commented Mar 27, 2026

#9795

Summary by CodeRabbit

  • New Features

    • Support for OpenAI's Responses API alongside Chat Completions, with automatic fallback when compatibility issues occur.
    • More robust streaming with progressive text deltas and reconstructed function/tool outputs.
  • Refactor

    • Unified request/response handling and centralized token-limit error reporting for more consistent behavior.
  • Bug Fixes

    • Improved retry and error handling for streaming and non-streaming interactions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

Walkthrough

OpenAIClient now supports both OpenAI Chat Completions and Responses APIs. It normalizes a base URL, selects the API via a _use_responses_api flag (with automatic fallback), builds API-specific payloads, parses both sync and SSE streaming formats, and centralizes token-limit error handling.

Changes

Cohort / File(s) Summary
OpenAI Client Dual-API Support
web/pgadmin/llm/providers/openai.py
Reworked OpenAIClient to accept a normalized _base_url and opt between Chat Completions and Responses APIs via _use_responses_api/computed _api_url. Added payload builders (_build_chat_payload, _build_responses_payload), message/tool converters for Responses, response parsers (_parse_responses_response), streaming reader for Responses (_read_responses_stream), SSE parsing, retry/fallback logic from Chat to Responses, and centralized token-limit handling (_raise_max_tokens_error).

Sequence Diagram

sequenceDiagram
    participant Client
    participant OpenAIClient
    participant ChatCompletionsAPI
    participant ResponsesAPI

    Client->>OpenAIClient: chat(messages, tools)

    rect rgba(100,150,200,0.5)
    Note over OpenAIClient: Attempt Chat Completions
    OpenAIClient->>OpenAIClient: _build_chat_payload()
    OpenAIClient->>ChatCompletionsAPI: POST /chat/completions
    end

    alt Chat Completions returns error (incompatible)
        ChatCompletionsAPI-->>OpenAIClient: LLMClientError
        rect rgba(200,150,100,0.5)
        Note over OpenAIClient: Fallback to Responses API
        OpenAIClient->>OpenAIClient: _build_responses_payload()
        OpenAIClient->>ResponsesAPI: POST /responses
        ResponsesAPI-->>OpenAIClient: Response / SSE events
        OpenAIClient->>OpenAIClient: _parse_responses_response() / _read_responses_stream()
        OpenAIClient-->>Client: LLMResponse
        end
    else Chat Completions succeeds
        ChatCompletionsAPI-->>OpenAIClient: Response / SSE events
        OpenAIClient->>OpenAIClient: _parse_response() / _read_openai_stream()
        OpenAIClient-->>Client: LLMResponse
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title directly and clearly summarizes the main change: adding support for the newer OpenAI Responses API endpoint alongside the existing Chat Completions endpoint.
Docstring Coverage ✅ Passed Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
web/pgadmin/llm/providers/openai.py (6)

166-170: Preserve exception chain with raise ... from e.

Same issue as in chat() - use exception chaining to preserve the original traceback.

♻️ Proposed fix
         except Exception as e:
             raise LLMClientError(LLMError(
-                message=f"Request failed: {str(e)}",
+                message=f"Request failed: {e!s}",
                 provider=self.provider_name
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 166 - 170, The except block
that wraps exceptions into LLMClientError is losing the original traceback;
update the handler to re-raise LLMClientError while preserving the exception
chain by using "raise ... from e" when constructing the LLMClientError/LLMError
(refer to the same except block that currently does "except Exception as e:" and
uses LLMClientError and LLMError and self.provider_name); ensure the new raise
statement includes "from e" so the original exception context is retained.

562-578: Consider extracting duplicated max_tokens error handling.

This error handling block (lines 563-578) is nearly identical to lines 471-486 in _parse_response(). Consider extracting to a shared helper method to reduce duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 562 - 578, Duplicate
max_tokens truncation handling in _parse_response() and the block checking "if
not content and not tool_calls" should be extracted into a single helper (e.g.,
_raise_max_tokens_error or _handle_max_tokens_truncation) that builds and raises
the LLMClientError(LLMError(...)). Move construction of the message (using
StopReason.MAX_TOKENS, usage.input_tokens, self._model, self.provider_name,
code='max_tokens', retryable=False) into that helper and replace both inline
blocks with a call to it, passing the usage object (or input_tokens) and any
context needed.

608-612: Preserve exception chain with raise ... from e.

Consistent with the other handlers, use exception chaining here.

♻️ Proposed fix
             except Exception as e:
                 raise LLMClientError(LLMError(
-                    message=f"Streaming request failed: {str(e)}",
+                    message=f"Streaming request failed: {e!s}",
                     provider=self.provider_name
-                ))
+                )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 608 - 612, The except block
currently raises LLMClientError(LLMError(...)) without preserving the original
exception; update the handler where the streaming request is caught (the block
constructing LLMError with message f"Streaming request failed: {str(e)}" and
provider self.provider_name) to re-raise using exception chaining (raise
LLMClientError(LLMError(...)) from e) so the original traceback is preserved
while keeping the same message and provider fields.

143-148: Preserve exception chain with raise ... from e.

Catching broad Exception here loses the original traceback, making debugging harder. Use exception chaining to preserve context.

♻️ Proposed fix
         except Exception as e:
             raise LLMClientError(LLMError(
-                message=f"Request failed: {str(e)}",
+                message=f"Request failed: {e!s}",
                 provider=self.provider_name
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 143 - 148, In the except
block in web/pgadmin/llm/providers/openai.py where you catch Exception as e and
raise LLMClientError(LLMError(...)) (referencing LLMClientError, LLMError and
self.provider_name), preserve the original traceback by chaining the
exception—rethrow the LLMClientError using "raise ... from e" so the original
exception e is attached to the new error for proper debugging context.

172-177: String-based error detection is fragile but pragmatic.

The detection relies on error message substrings which could break if OpenAI changes their error wording. Consider additionally checking error.code if OpenAI provides structured error codes for this case in the future.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 172 - 177, The current
_should_use_responses_api relies only on string-matching the error message;
update it to also inspect a structured error code when available by reading
getattr(error, "code", None) (or error.code) and returning True for known codes
indicating model/endpoint incompatibility (e.g.,
model-not-supported/unsupported_model or whatever OpenAI uses), while retaining
the existing substring checks on str(error) for backward compatibility;
reference the _should_use_responses_api function and the error parameter when
making this change.

634-638: Also preserve exception chain here.

This existing exception handler should also use raise ... from e for consistency.

♻️ Proposed fix
         except Exception as e:
             raise LLMClientError(LLMError(
-                message=f"Streaming request failed: {str(e)}",
+                message=f"Streaming request failed: {e!s}",
                 provider=self.provider_name
-            ))
+            )) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 634 - 638, In the except
block that catches Exception e around the streaming request, preserve the
exception chain by re-raising the LLMClientError from the original exception
(use "raise LLMClientError(... ) from e"); update the raise that constructs
LLMClientError(LLMError(..., provider=self.provider_name)) so it uses "from e"
to maintain the original traceback and aid debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@web/pgadmin/llm/providers/openai.py`:
- Around line 166-170: The except block that wraps exceptions into
LLMClientError is losing the original traceback; update the handler to re-raise
LLMClientError while preserving the exception chain by using "raise ... from e"
when constructing the LLMClientError/LLMError (refer to the same except block
that currently does "except Exception as e:" and uses LLMClientError and
LLMError and self.provider_name); ensure the new raise statement includes "from
e" so the original exception context is retained.
- Around line 562-578: Duplicate max_tokens truncation handling in
_parse_response() and the block checking "if not content and not tool_calls"
should be extracted into a single helper (e.g., _raise_max_tokens_error or
_handle_max_tokens_truncation) that builds and raises the
LLMClientError(LLMError(...)). Move construction of the message (using
StopReason.MAX_TOKENS, usage.input_tokens, self._model, self.provider_name,
code='max_tokens', retryable=False) into that helper and replace both inline
blocks with a call to it, passing the usage object (or input_tokens) and any
context needed.
- Around line 608-612: The except block currently raises
LLMClientError(LLMError(...)) without preserving the original exception; update
the handler where the streaming request is caught (the block constructing
LLMError with message f"Streaming request failed: {str(e)}" and provider
self.provider_name) to re-raise using exception chaining (raise
LLMClientError(LLMError(...)) from e) so the original traceback is preserved
while keeping the same message and provider fields.
- Around line 143-148: In the except block in
web/pgadmin/llm/providers/openai.py where you catch Exception as e and raise
LLMClientError(LLMError(...)) (referencing LLMClientError, LLMError and
self.provider_name), preserve the original traceback by chaining the
exception—rethrow the LLMClientError using "raise ... from e" so the original
exception e is attached to the new error for proper debugging context.
- Around line 172-177: The current _should_use_responses_api relies only on
string-matching the error message; update it to also inspect a structured error
code when available by reading getattr(error, "code", None) (or error.code) and
returning True for known codes indicating model/endpoint incompatibility (e.g.,
model-not-supported/unsupported_model or whatever OpenAI uses), while retaining
the existing substring checks on str(error) for backward compatibility;
reference the _should_use_responses_api function and the error parameter when
making this change.
- Around line 634-638: In the except block that catches Exception e around the
streaming request, preserve the exception chain by re-raising the LLMClientError
from the original exception (use "raise LLMClientError(... ) from e"); update
the raise that constructs LLMClientError(LLMError(...,
provider=self.provider_name)) so it uses "from e" to maintain the original
traceback and aid debugging.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9660b0cd-9bd8-4822-b6a9-fd3dfa3ccb25

📥 Commits

Reviewing files that changed from the base of the PR and between aa3dc38 and 87ab524.

📒 Files selected for processing (1)
  • web/pgadmin/llm/providers/openai.py

- Preserve exception chains with 'raise ... from e' in all
  exception handlers for better debugging tracebacks.
- Use f-string !s conversion instead of str() calls.
- Extract duplicated max_tokens error handling into a shared
  _raise_max_tokens_error() helper method.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web/pgadmin/llm/providers/openai.py`:
- Around line 546-568: The code currently infers stop reasons only from
tool_calls or a coarse status == 'incomplete'; update both
_parse_responses_response and _read_responses_stream to inspect the final
response's status and, when status == 'incomplete', consult
data.get('incomplete_details', {}).get('reason') to choose the proper StopReason
(e.g., map 'max_tokens' -> StopReason.MAX_TOKENS, 'content_filter' ->
StopReason.CONTENT_FILTER, others -> StopReason.UNKNOWN); use that stop_reason
when deciding to call self._raise_max_tokens_error(usage.input_tokens) (only for
max_tokens) and to classify responses even if content or tool_calls were emitted
so incomplete statuses are not ignored. Ensure you reference the same variables
used now (status, data, usage, stop_reason) and update the StopReason assignment
logic in both parsing paths.
- Around line 71-79: Validate and normalize the provided api_url inside
__init__: ensure api_url (or DEFAULT_API_BASE_URL) is treated as a base URL by
stripping any known endpoint suffixes like '/chat/completions' or '/responses'
(or their trailing variants) before assigning to self._base_url, or raise a
ValueError if the value appears to already be a full endpoint; this prevents
_api_url from producing duplicated paths when _use_responses_api toggles between
'/responses' and '/chat/completions'.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2c38230f-6ce8-46f4-8c68-c2ac6d65af02

📥 Commits

Reviewing files that changed from the base of the PR and between 87ab524 and 4959c75.

📒 Files selected for processing (1)
  • web/pgadmin/llm/providers/openai.py

Comment on lines +546 to +568
# Determine stop reason
status = data.get('status', '')
if tool_calls:
stop_reason = StopReason.TOOL_USE
elif status == 'completed':
stop_reason = StopReason.END_TURN
elif status == 'incomplete':
stop_reason = StopReason.MAX_TOKENS
else:
stop_reason = StopReason.UNKNOWN

# Parse usage information
usage_data = data.get('usage', {})
usage = Usage(
input_tokens=usage_data.get('input_tokens', 0),
output_tokens=usage_data.get('output_tokens', 0),
total_tokens=usage_data.get('total_tokens', 0)
)

# Check for problematic responses
if not content and not tool_calls:
if stop_reason == StopReason.MAX_TOKENS:
self._raise_max_tokens_error(usage.input_tokens)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file structure and get file size
wc -l web/pgadmin/llm/providers/openai.py

Repository: pgadmin-org/pgadmin4

Length of output: 103


🏁 Script executed:

# Read the _parse_responses_response method (lines 546-568)
sed -n '540,580p' web/pgadmin/llm/providers/openai.py | cat -n

Repository: pgadmin-org/pgadmin4

Length of output: 1708


🏁 Script executed:

# Read the _read_responses_stream method (lines 848-919)
sed -n '840,930p' web/pgadmin/llm/providers/openai.py | cat -n

Repository: pgadmin-org/pgadmin4

Length of output: 3464


🏁 Script executed:

# Search for incomplete_details usage in the file
rg 'incomplete_details' web/pgadmin/llm/providers/openai.py

Repository: pgadmin-org/pgadmin4

Length of output: 46


🏁 Script executed:

# Search for StopReason enum definition
rg -A 10 'class StopReason\|enum StopReason' web/pgadmin/llm/providers/

Repository: pgadmin-org/pgadmin4

Length of output: 46


🏁 Script executed:

# Check for any existing handling of incomplete_details or status logic
rg 'status.*incomplete|incomplete.*reason' web/pgadmin/llm/providers/openai.py

Repository: pgadmin-org/pgadmin4

Length of output: 100


🏁 Script executed:

# Look for the response.completed event handling and response structure
sed -n '836,920p' web/pgadmin/llm/providers/openai.py | cat -n

Repository: pgadmin-org/pgadmin4

Length of output: 3637


🏁 Script executed:

# Search for any mentions of failed or error in the streaming context
rg 'failed|error|refusal' web/pgadmin/llm/providers/openai.py | head -20

Repository: pgadmin-org/pgadmin4

Length of output: 1139


🏁 Script executed:

# Check imports and class definitions for context
head -50 web/pgadmin/llm/providers/openai.py | cat -n

Repository: pgadmin-org/pgadmin4

Length of output: 1759


🏁 Script executed:

# Look for the full _read_responses_stream method signature and context
sed -n '820,850p' web/pgadmin/llm/providers/openai.py | cat -n

Repository: pgadmin-org/pgadmin4

Length of output: 1168


🏁 Script executed:

# Search for StopReason enum to see all available values
rg 'StopReason' web/pgadmin/llm/providers/openai.py -B 2 -A 15

Repository: pgadmin-org/pgadmin4

Length of output: 4469


🏁 Script executed:

# Search for any references to incomplete_details or response structure comments
rg 'response\.|incomplete|refus' web/pgadmin/llm/providers/openai.py -B 2 -A 2 | head -50

Repository: pgadmin-org/pgadmin4

Length of output: 1597


🏁 Script executed:

# Look at the full _parse_responses_response method to understand context better
sed -n '500,575p' web/pgadmin/llm/providers/openai.py | cat -n

Repository: pgadmin-org/pgadmin4

Length of output: 3232


🏁 Script executed:

# Check if there are any TODO or FIXME comments related to this
rg 'TODO|FIXME' web/pgadmin/llm/providers/openai.py

Repository: pgadmin-org/pgadmin4

Length of output: 46


🏁 Script executed:

# Check what the response.completed event structure looks like in the streaming method
sed -n '880,920p' web/pgadmin/llm/providers/openai.py | cat -n

Repository: pgadmin-org/pgadmin4

Length of output: 1696


🌐 Web query:

OpenAI Responses API incomplete_details reason documentation

💡 Result:

In the OpenAI Responses API, the incomplete_details field in the Response object provides details about why a response is incomplete when the overall status is "incomplete". It is an object containing a reason field. The documented possible values for incomplete_details.reason are: - "max_output_tokens": The response was truncated because the max_output_tokens limit was reached. - "content_filter": The response was stopped due to content filtering (e.g., safety violations). This is from the official API reference for retrieving a model response (e.g., GET /responses/{response_id}). Example usage from community reports: When max_output_tokens is hit in batch processing or with reasoning models like GPT-5, the response may have status "incomplete" with incomplete_details: {"reason": "max_output_tokens"}. Some users report bugs where incomplete_details is null even when expected. Note: This differs from older APIs like Chat Completions (finish_reason: "length") or Assistants (run.incomplete_details.reason: "max_prompt_tokens"). The Responses API is newer, recommended for reasoning models like GPT-5 series.

Citations:


🌐 Web query:

OpenAI Responses API status incomplete_details field

💡 Result:

The incomplete_details field in the OpenAI Responses API is an optional object present in the Response object when the response status is "incomplete". It provides details on why the response generation stopped prematurely. Structure: - reason: string indicating the specific cause, such as "max_output_tokens" (when the max_output_tokens limit is reached, including during reasoning) or "content_filter" (when content is filtered). Example usage for handling: if (response.status === "incomplete" && response.incomplete_details?.reason === "max_output_tokens") { console.log("Ran out of tokens"); if (response.output_text?.length > 0) { console.log("Partial output:", response.output_text); } else { console.log("Ran out of tokens during reasoning"); } } Note: There are reported bugs where incomplete_details may be null even when status is "completed" but output is truncated (e.g., with GPT-5 models).

Citations:


Use the final response status and incomplete_details.reason in both parsers.

The Responses API provides incomplete_details.reason to distinguish between incomplete statuses. Currently, _parse_responses_response() treats all status == 'incomplete' as MAX_TOKENS, and _read_responses_stream() ignores the final response status entirely, inferring success only from emitted text or tool calls. This can misclassify actual content_filter refusals or other incomplete reasons as token-limit errors, and allows successful-looking responses with incomplete status to bypass error handling. Both code paths should check the final response status and use incomplete_details.reason (when present) to set the correct StopReason.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 546 - 568, The code
currently infers stop reasons only from tool_calls or a coarse status ==
'incomplete'; update both _parse_responses_response and _read_responses_stream
to inspect the final response's status and, when status == 'incomplete', consult
data.get('incomplete_details', {}).get('reason') to choose the proper StopReason
(e.g., map 'max_tokens' -> StopReason.MAX_TOKENS, 'content_filter' ->
StopReason.CONTENT_FILTER, others -> StopReason.UNKNOWN); use that stop_reason
when deciding to call self._raise_max_tokens_error(usage.input_tokens) (only for
max_tokens) and to classify responses even if content or tool_calls were emitted
so incomplete statuses are not ignored. Ensure you reference the same variables
used now (status, data, usage, stop_reason) and update the StopReason assignment
logic in both parsing paths.

- Strip known endpoint suffixes (/chat/completions, /responses) from
  api_url in __init__ to prevent doubled paths if a user provides a
  full endpoint URL instead of a base URL.
- Use incomplete_details.reason from the Responses API to properly
  distinguish between max_output_tokens and content_filter when the
  response status is 'incomplete', in both the non-streaming and
  streaming parsers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
web/pgadmin/llm/providers/openai.py (4)

78-78: Thread safety consideration for shared client instances.

The _use_responses_api flag is modified during chat() and chat_stream() calls. If the same OpenAIClient instance is shared across concurrent requests in a multi-threaded environment, this could cause race conditions where one thread's API selection affects another.

If thread safety is needed, consider:

  1. Making the flag thread-local
  2. Passing it as a method parameter
  3. Documenting that clients should be per-request
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` at line 78, The _use_responses_api
boolean on OpenAIClient is mutated inside chat() and chat_stream(), which
creates a race condition if an OpenAIClient instance is reused across threads;
make the selection thread-safe by removing shared mutable state: either convert
_use_responses_api to thread-local storage (e.g., use threading.local within
OpenAIClient), or stop mutating it and instead pass the selection as a parameter
to chat()/chat_stream() (or compute it locally inside those methods), and update
all call sites accordingly; reference _use_responses_api, chat(), chat_stream(),
and OpenAIClient in your changes.

178-183: Error message matching is fragile but acceptable.

The string-based detection in _should_use_responses_api() relies on specific OpenAI error message text. If OpenAI changes error wording, the fallback may not trigger. Consider documenting this behavior or adding alternative detection methods in the future.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 178 - 183, The current
_should_use_responses_api(error: LLMClientError) implementation relies on
fragile string matching of OpenAI error messages; update the function by adding
a clear docstring/comment documenting this fragility and the specific substrings
being checked (e.g., 'v1/responses', 'not supported in the v1/chat/completions',
'not a chat model'), and add a TODO to implement more robust detection later
(e.g., checking error codes/structured fields if available); keep the existing
logic but annotate the function (and possibly add unit tests referencing
_should_use_responses_api) so future maintainers know why it exists and where to
improve it.

940-945: Inconsistent error handling between streaming and non-streaming paths.

The non-streaming _parse_responses_response() (lines 579-589) checks stop_reason before raising an error, calling _raise_max_tokens_error() for MAX_TOKENS and raising a content filter error for STOP_SEQUENCE. The streaming path here raises a generic error regardless of stop_reason.

For consistency, consider aligning the error handling:

♻️ Proposed fix for consistent error handling
         if not content and not tool_calls:
+            if stop_reason == StopReason.MAX_TOKENS:
+                self._raise_max_tokens_error(usage.input_tokens)
+            elif stop_reason == StopReason.STOP_SEQUENCE:
+                raise LLMClientError(LLMError(
+                    message='Response blocked by content filter.',
+                    code='content_filter',
+                    provider=self.provider_name,
+                    retryable=False
+                ))
             raise LLMClientError(LLMError(
                 message='No response content returned from API',
                 provider=self.provider_name,
                 retryable=False
             ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 940 - 945, The streaming
path currently raises a generic LLMClientError when both content and tool_calls
are empty; instead, mirror the non-streaming logic in
_parse_responses_response(): inspect the stop_reason from the response in the
streaming handler (the function that contains this content/tool_calls check),
call _raise_max_tokens_error() when stop_reason == MAX_TOKENS, and raise the
content filter error when stop_reason == STOP_SEQUENCE, otherwise raise the
existing generic 'No response content returned from API' error; reference and
reuse _raise_max_tokens_error() and the same error construction used in
_parse_responses_response() to keep behavior consistent across streaming and
non-streaming paths.

817-822: Same inconsistency in Chat Completions streaming path.

The _read_openai_stream() method also raises a generic error without checking stop_reason. Apply the same fix pattern here for consistency with _parse_response() (lines 492-503).

♻️ Proposed fix
         if not content and not tool_calls:
+            if stop_reason == StopReason.MAX_TOKENS:
+                self._raise_max_tokens_error(usage.input_tokens)
             raise LLMClientError(LLMError(
                 message='No response content returned from API',
                 provider=self.provider_name,
                 retryable=False
             ))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/pgadmin/llm/providers/openai.py` around lines 817 - 822, In
_read_openai_stream(), when you hit the branch that currently raises
LLMClientError for "No response content returned from API" (the case with no
content and no tool_calls), inspect the stop_reason value and set the
LLMError.retryable flag the same way as in _parse_response(): treat stop_reason
== 'length' as retryable=True and all other stop reasons as retryable=False;
include the provider (self.provider_name) and the stop_reason in the error
message so callers get the same behavior and diagnostics as _parse_response().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@web/pgadmin/llm/providers/openai.py`:
- Line 78: The _use_responses_api boolean on OpenAIClient is mutated inside
chat() and chat_stream(), which creates a race condition if an OpenAIClient
instance is reused across threads; make the selection thread-safe by removing
shared mutable state: either convert _use_responses_api to thread-local storage
(e.g., use threading.local within OpenAIClient), or stop mutating it and instead
pass the selection as a parameter to chat()/chat_stream() (or compute it locally
inside those methods), and update all call sites accordingly; reference
_use_responses_api, chat(), chat_stream(), and OpenAIClient in your changes.
- Around line 178-183: The current _should_use_responses_api(error:
LLMClientError) implementation relies on fragile string matching of OpenAI error
messages; update the function by adding a clear docstring/comment documenting
this fragility and the specific substrings being checked (e.g., 'v1/responses',
'not supported in the v1/chat/completions', 'not a chat model'), and add a TODO
to implement more robust detection later (e.g., checking error codes/structured
fields if available); keep the existing logic but annotate the function (and
possibly add unit tests referencing _should_use_responses_api) so future
maintainers know why it exists and where to improve it.
- Around line 940-945: The streaming path currently raises a generic
LLMClientError when both content and tool_calls are empty; instead, mirror the
non-streaming logic in _parse_responses_response(): inspect the stop_reason from
the response in the streaming handler (the function that contains this
content/tool_calls check), call _raise_max_tokens_error() when stop_reason ==
MAX_TOKENS, and raise the content filter error when stop_reason ==
STOP_SEQUENCE, otherwise raise the existing generic 'No response content
returned from API' error; reference and reuse _raise_max_tokens_error() and the
same error construction used in _parse_responses_response() to keep behavior
consistent across streaming and non-streaming paths.
- Around line 817-822: In _read_openai_stream(), when you hit the branch that
currently raises LLMClientError for "No response content returned from API" (the
case with no content and no tool_calls), inspect the stop_reason value and set
the LLMError.retryable flag the same way as in _parse_response(): treat
stop_reason == 'length' as retryable=True and all other stop reasons as
retryable=False; include the provider (self.provider_name) and the stop_reason
in the error message so callers get the same behavior and diagnostics as
_parse_response().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a908a3c-734e-4b5d-9b66-0cb1fc81ea9e

📥 Commits

Reviewing files that changed from the base of the PR and between 4959c75 and eaa9261.

📒 Files selected for processing (1)
  • web/pgadmin/llm/providers/openai.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant