Conversation
There was a problem hiding this comment.
Pull request overview
Adds Google Cloud Axion-related sources/questions to the embedding-generation pipeline and updates source-registration behavior to keep newly discovered sources grouped by site in the sources CSV.
Changes:
- Added Google Cloud (Axion / Arm Compute Engine + GKE) documentation entries to
vector-db-sources.csv. - Updated
register_sourceto insert newly discovered sources immediately after the existing block for the samesite_name(instead of always appending). - Extended retrieval evaluation questions with Axion-focused prompts and added a unit test for the new insertion behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| embedding-generation/vector-db-sources.csv | Adds Google Cloud Axion/GKE sources and relocates the Graviton guide block within the CSV. |
| embedding-generation/generate-chunks.py | Changes source registration to keep new sources grouped by site_name when saving back to CSV. |
| embedding-generation/tests/test_generate_chunks.py | Adds a test asserting grouped insertion behavior for register_source. |
| embedding-generation/eval_questions.json | Replaces an AWS Graviton runbook eval question with a set of Axion-related eval questions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for index, existing_source in enumerate(all_sources): | ||
| if existing_source.get('site_name') == site_name: | ||
| insert_at = index + 1 |
There was a problem hiding this comment.
The insertion-point search scans the full all_sources list even though only the last matching site_name matters. This can become unnecessarily expensive when many new sources are registered. Consider iterating from the end and breaking on the first match (or maintaining a site_name->last_index map) to avoid full-list scans.
| for index, existing_source in enumerate(all_sources): | |
| if existing_source.get('site_name') == site_name: | |
| insert_at = index + 1 | |
| for index in range(len(all_sources) - 1, -1, -1): | |
| if all_sources[index].get('site_name') == site_name: | |
| insert_at = index + 1 | |
| break |
| Graviton Getting Started Guide,CC4.0,Graviton Getting Started,https://github.com/aws/aws-graviton-getting-started/blob/main/README.md,aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - C/C++,https://github.com/aws/aws-graviton-getting-started/blob/main/c-c++.md,c; c++; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Golang,https://github.com/aws/aws-graviton-getting-started/blob/main/golang.md,golang; go; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Java,https://github.com/aws/aws-graviton-getting-started/blob/main/java.md,java; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Dotnet,https://github.com/aws/aws-graviton-getting-started/blob/main/dotnet.md,.net; dotnet; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Nodejs,https://github.com/aws/aws-graviton-getting-started/blob/main/nodejs.md,nodejs; node; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - PHP,https://github.com/aws/aws-graviton-getting-started/blob/main/php.md,php; web; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Python,https://github.com/aws/aws-graviton-getting-started/blob/main/python.md,python; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Rust,https://github.com/aws/aws-graviton-getting-started/blob/main/rust.md,rust; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Containers,https://github.com/aws/aws-graviton-getting-started/blob/main/containers.md,containers; container; docker; kubernetes; aws; gravition; basics; started; graviton2; graviton3; graviton4 | ||
| Graviton Getting Started Guide,CC4.0,Graviton - Headless websites,https://github.com/aws/aws-graviton-getting-started/blob/main/software/ChromeAndPuppeteer.md,headless; website; web; aws; gravition; basics; started; graviton2; graviton3; graviton4 |
There was a problem hiding this comment.
Keywords include a misspelling: gravition should be graviton. Since keywords drive retrieval, the typo will reduce searchability across the entire Graviton section; please correct it consistently in these rows.
| Graviton Getting Started Guide,CC4.0,Graviton Getting Started,https://github.com/aws/aws-graviton-getting-started/blob/main/README.md,aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - C/C++,https://github.com/aws/aws-graviton-getting-started/blob/main/c-c++.md,c; c++; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Golang,https://github.com/aws/aws-graviton-getting-started/blob/main/golang.md,golang; go; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Java,https://github.com/aws/aws-graviton-getting-started/blob/main/java.md,java; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Dotnet,https://github.com/aws/aws-graviton-getting-started/blob/main/dotnet.md,.net; dotnet; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Nodejs,https://github.com/aws/aws-graviton-getting-started/blob/main/nodejs.md,nodejs; node; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - PHP,https://github.com/aws/aws-graviton-getting-started/blob/main/php.md,php; web; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Python,https://github.com/aws/aws-graviton-getting-started/blob/main/python.md,python; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Rust,https://github.com/aws/aws-graviton-getting-started/blob/main/rust.md,rust; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Containers,https://github.com/aws/aws-graviton-getting-started/blob/main/containers.md,containers; container; docker; kubernetes; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Headless websites,https://github.com/aws/aws-graviton-getting-started/blob/main/software/ChromeAndPuppeteer.md,headless; website; web; aws; gravition; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton Getting Started,https://github.com/aws/aws-graviton-getting-started/blob/main/README.md,aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - C/C++,https://github.com/aws/aws-graviton-getting-started/blob/main/c-c++.md,c; c++; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Golang,https://github.com/aws/aws-graviton-getting-started/blob/main/golang.md,golang; go; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Java,https://github.com/aws/aws-graviton-getting-started/blob/main/java.md,java; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Dotnet,https://github.com/aws/aws-graviton-getting-started/blob/main/dotnet.md,.net; dotnet; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Nodejs,https://github.com/aws/aws-graviton-getting-started/blob/main/nodejs.md,nodejs; node; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - PHP,https://github.com/aws/aws-graviton-getting-started/blob/main/php.md,php; web; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Python,https://github.com/aws/aws-graviton-getting-started/blob/main/python.md,python; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Rust,https://github.com/aws/aws-graviton-getting-started/blob/main/rust.md,rust; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Containers,https://github.com/aws/aws-graviton-getting-started/blob/main/containers.md,containers; container; docker; kubernetes; aws; graviton; basics; started; graviton2; graviton3; graviton4 | |
| Graviton Getting Started Guide,CC4.0,Graviton - Headless websites,https://github.com/aws/aws-graviton-getting-started/blob/main/software/ChromeAndPuppeteer.md,headless; website; web; aws; graviton; basics; started; graviton2; graviton3; graviton4 |
| "question": "What Google Axion-backed Compute Engine machine series are available for Arm VMs, and how do C4A and N4A differ?", | ||
| "expected_urls": [ | ||
| "https://github.com/aws/aws-graviton-getting-started/blob/main/perfrunbook/defining_your_benchmark.md", | ||
| "https://github.com/aws/aws-graviton-getting-started/blob/main/perfrunbook/configuring_your_sut.md" | ||
| "https://docs.cloud.google.com/compute/docs/instances/arm-on-compute" | ||
| ] | ||
| }, |
There was a problem hiding this comment.
This change replaces an existing Graviton performance-runbook eval question with new Axion questions. Given the PR title (“add axion data”), confirm whether removing the Graviton evaluation coverage is intentional; if not, please re-add the Graviton question(s) so retrieval evaluation continues to exercise those sources too.
No description provided.