Indicate server-side tool calls early when streaming

TLDR: When using grounding (in my case GoogleSearch) with `generate_content_stream`, would it be possible to send a `Part` with metadata indicating that this activity is happening _before_ streaming the output of the model?

Long description (as per template):

I have spend the last 2 days implementing "grounding with google search" ([xref docs](https://ai.google.dev/gemini-api/docs/google-search)) in my application. My app is "streaming by default", meaning that all requests use `generate_content_stream`. 

The piece that this SDK seems to be missing, and that I spend most of my time fruitlessly searching for, is some kind of indicator that a server-side tool call is happening while we wait for content to be generated. 

Currently I just get a long TTFR (time to first response) and then immediately the first content part. The first indication of a web search is the _last_ chunk/part of the stream which contains all the `grounding_metadata` as one big chunk. Could we break up this chunk and stream pieces as they become available?

For context, I've been through the same "search-based grounding" exercise with the OpenAI SDK, Anthropic SDK, and xAI SDK before this and all of them send (server-side) tool_call/tool_response pairs when search happens. The tool call message is near immediate (very low "TTFR"), the tool result comes when search is done and may include an encrypted result (similar to reasoning), and then content generation starts.

Here is the ideal behavior:
- client sends request via `generate_content_stream`
- server sends a `Part` with `grounding_metadata.web_search_queries`
- server sends a `Part` with `grounding_metadata.grounding_chunks` as they become available
- (repeat for follow-up searches)
- server sends the first `Part` with `content` of the output
- (repeat for all content chunks)
- server sends final `Part` with `grounding_metadata.grounding_supports`

Bonus points if we can somehow correlate `grounding_chunks` with individual search queries so that we can group them in the UI. This pattern, of course, extends to all other server-side calls.

Here is the minimal behavior to solve my problem:
- client sends request via `generate_content_stream`
- server sends a `Part` with _some_ indication that a server-side tool call is happening.
- continue as it works today.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Indicate server-side tool calls early when streaming #1940

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Indicate server-side tool calls early when streaming #1940

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions