Skip to content

Indicate server-side tool calls early when streaming #1940

@FirefoxMetzger

Description

@FirefoxMetzger

TLDR: When using grounding (in my case GoogleSearch) with generate_content_stream, would it be possible to send a Part with metadata indicating that this activity is happening before streaming the output of the model?

Long description (as per template):

I have spend the last 2 days implementing "grounding with google search" (xref docs) in my application. My app is "streaming by default", meaning that all requests use generate_content_stream.

The piece that this SDK seems to be missing, and that I spend most of my time fruitlessly searching for, is some kind of indicator that a server-side tool call is happening while we wait for content to be generated.

Currently I just get a long TTFR (time to first response) and then immediately the first content part. The first indication of a web search is the last chunk/part of the stream which contains all the grounding_metadata as one big chunk. Could we break up this chunk and stream pieces as they become available?

For context, I've been through the same "search-based grounding" exercise with the OpenAI SDK, Anthropic SDK, and xAI SDK before this and all of them send (server-side) tool_call/tool_response pairs when search happens. The tool call message is near immediate (very low "TTFR"), the tool result comes when search is done and may include an encrypted result (similar to reasoning), and then content generation starts.

Here is the ideal behavior:

  • client sends request via generate_content_stream
  • server sends a Part with grounding_metadata.web_search_queries
  • server sends a Part with grounding_metadata.grounding_chunks as they become available
  • (repeat for follow-up searches)
  • server sends the first Part with content of the output
  • (repeat for all content chunks)
  • server sends final Part with grounding_metadata.grounding_supports

Bonus points if we can somehow correlate grounding_chunks with individual search queries so that we can group them in the UI. This pattern, of course, extends to all other server-side calls.

Here is the minimal behavior to solve my problem:

  • client sends request via generate_content_stream
  • server sends a Part with some indication that a server-side tool call is happening.
  • continue as it works today.

Metadata

Metadata

Assignees

Labels

priority: p3Desirable enhancement or fix. May not be included in next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions