-
Notifications
You must be signed in to change notification settings - Fork 732
Description
TLDR: When using grounding (in my case GoogleSearch) with generate_content_stream, would it be possible to send a Part with metadata indicating that this activity is happening before streaming the output of the model?
Long description (as per template):
I have spend the last 2 days implementing "grounding with google search" (xref docs) in my application. My app is "streaming by default", meaning that all requests use generate_content_stream.
The piece that this SDK seems to be missing, and that I spend most of my time fruitlessly searching for, is some kind of indicator that a server-side tool call is happening while we wait for content to be generated.
Currently I just get a long TTFR (time to first response) and then immediately the first content part. The first indication of a web search is the last chunk/part of the stream which contains all the grounding_metadata as one big chunk. Could we break up this chunk and stream pieces as they become available?
For context, I've been through the same "search-based grounding" exercise with the OpenAI SDK, Anthropic SDK, and xAI SDK before this and all of them send (server-side) tool_call/tool_response pairs when search happens. The tool call message is near immediate (very low "TTFR"), the tool result comes when search is done and may include an encrypted result (similar to reasoning), and then content generation starts.
Here is the ideal behavior:
- client sends request via
generate_content_stream - server sends a
Partwithgrounding_metadata.web_search_queries - server sends a
Partwithgrounding_metadata.grounding_chunksas they become available - (repeat for follow-up searches)
- server sends the first
Partwithcontentof the output - (repeat for all content chunks)
- server sends final
Partwithgrounding_metadata.grounding_supports
Bonus points if we can somehow correlate grounding_chunks with individual search queries so that we can group them in the UI. This pattern, of course, extends to all other server-side calls.
Here is the minimal behavior to solve my problem:
- client sends request via
generate_content_stream - server sends a
Partwith some indication that a server-side tool call is happening. - continue as it works today.