diff --git a/databricks-skills/databricks-aibi-dashboards/SKILL.md b/databricks-skills/databricks-aibi-dashboards/SKILL.md index 41dbeec6..726d58bf 100644 --- a/databricks-skills/databricks-aibi-dashboards/SKILL.md +++ b/databricks-skills/databricks-aibi-dashboards/SKILL.md @@ -334,7 +334,11 @@ y=12: Table (w=6, h=6) - Detailed data - `widgetType`: "line" or "bar" - Use `x`, `y`, optional `color` encodings - `scale.type`: `"temporal"` (dates), `"quantitative"` (numbers), `"categorical"` (strings) -- Use `"disaggregated": true` with pre-aggregated dataset data +- **ALWAYS use `"disaggregated": false` with explicit aggregation expressions** (e.g., `SUM(\`col\`)`) for the y-axis metric + +> ⚠️ **CRITICAL — MISSING_GROUP_BY error**: If you use `"disaggregated": true` on a chart (bar/line/pie), Databricks generates SQL with an implicit aggregation but WITHOUT a GROUP BY clause → `[MISSING_GROUP_BY] SQLSTATE: 42803`. +> **Rule**: charts always need `"disaggregated": false` + explicit `SUM()`/`AVG()`/`COUNT()` in the field expression. The field `name` must then match the aggregation pattern, e.g., `"name": "sum(revenue)"` with `"expression": "SUM(\`revenue\`)"`, and `"fieldName": "sum(revenue)"` in encodings. +> `"disaggregated": true` is **only** safe for **tables** (raw row display without aggregation). **Multiple Lines - Two Approaches:** @@ -359,6 +363,36 @@ y=12: Table (w=6, h=6) - Detailed data - **Stacked** (default): No `mark` field - bars stack on top of each other - **Grouped**: Add `"mark": {"layout": "group"}` - bars side-by-side for comparison +**Correct bar chart example (disaggregated: false + SUM):** +```json +{ + "widget": { + "name": "revenue-by-region", + "queries": [{ + "name": "main_query", + "query": { + "datasetName": "sales_ds", + "fields": [ + {"name": "region", "expression": "`region`"}, + {"name": "sum(revenue)", "expression": "SUM(`revenue`)"} + ], + "disaggregated": false + } + }], + "spec": { + "version": 3, + "widgetType": "bar", + "encodings": { + "x": {"fieldName": "region", "scale": {"type": "categorical"}, "displayName": "Region"}, + "y": {"fieldName": "sum(revenue)", "scale": {"type": "quantitative"}, "displayName": "Revenue"} + }, + "frame": {"showTitle": true, "title": "Revenue by Region"} + } + }, + "position": {"x": 0, "y": 0, "width": 3, "height": 5} +} +``` + **Pie Chart:** - `version`: **3** - `widgetType`: "pie" @@ -381,23 +415,33 @@ y=12: Table (w=6, h=6) - Detailed data --- +#### 🔴 CRITICAL: A Filter ONLY Affects Datasets in Its `queries` Array + +> **THIS IS THE MOST COMMON MISTAKE WITH FILTERS.** +> +> A filter widget does **NOT** automatically apply to all datasets or all pages. +> **A filter ONLY filters the datasets explicitly listed in its `queries` array.** +> +> - If a dataset is not in `queries`, the filter has **zero effect** on widgets using that dataset. +> - For global filters to work across all pages, you **MUST** include every dataset in the filter's `queries` list. +> - For page-level filters to work on all widgets on that page, every dataset used on that page must be listed. + +**Practical rule:** Every dataset that has the filter column must be added to the filter's `queries` array (with its own entry). One entry = one dataset. + +--- + #### Global Filters vs Page-Level Filters | Type | Placement | Scope | Use Case | |------|-----------|-------|----------| -| **Global Filter** | Dedicated page with `"pageType": "PAGE_TYPE_GLOBAL_FILTERS"` | Affects ALL pages that have datasets with the filter field | Cross-dashboard filtering (e.g., date range, campaign) | -| **Page-Level Filter** | Regular page with `"pageType": "PAGE_TYPE_CANVAS"` | Affects ONLY widgets on that same page | Page-specific filtering (e.g., platform filter on breakdown page only) | - -**Key Insight**: A filter only affects datasets that contain the filter field. To have a filter affect only specific pages: -1. Include the filter dimension in datasets for pages that should be filtered -2. Exclude the filter dimension from datasets for pages that should NOT be filtered +| **Global Filter** | Dedicated page with `"pageType": "PAGE_TYPE_GLOBAL_FILTERS"` | Affects ONLY datasets listed in its `queries` array | Cross-dashboard filtering (e.g., region, bodega) | +| **Page-Level Filter** | Regular page with `"pageType": "PAGE_TYPE_CANVAS"` | Affects ONLY datasets listed in its `queries` array, on that same page | Page-specific filtering (e.g., month selector on one page) | --- -#### Filter Widget Structure +#### Filter Widget Structure — Single Dataset -> **CRITICAL**: Do NOT use `associative_filter_predicate_group` - it causes SQL errors! -> Use a simple field expression instead. +Use this when the filter only needs to affect one dataset (e.g., a page-level filter for a unique field): ```json { @@ -432,9 +476,12 @@ y=12: Table (w=6, h=6) - Detailed data --- -#### Global Filter Example +#### Filter Widget Structure — Multiple Datasets (REQUIRED for most real filters) -Place on a dedicated filter page: +When a filter must apply to many datasets (e.g., a global REGION filter that should work on all pages), you **must** list every target dataset in `queries`. Each dataset gets: +1. A **unique query name** (use pattern `q_{dataset_name}_{field_name}`) +2. An extra `{field}_associativity` field: `"expression": "COUNT_IF(\`associative_filter_predicate_group\`)"` — this enables cross-dataset associativity and **is required** in the multi-dataset pattern +3. A corresponding entry in `encodings.fields` pointing to that query ```json { @@ -472,11 +519,75 @@ Place on a dedicated filter page: } ``` +> **Note on `associative_filter_predicate_group`**: This is a Databricks internal virtual column used only in the `COUNT_IF(...)` expression within the `_associativity` field. It is **required** in the multi-dataset pattern — do NOT omit it. It is NOT a regular column in the dataset SQL and does NOT cause errors. + +**Python helper pattern for multi-dataset filters:** + +```python +def filt(name, title, primary_ds, field_name, ftype="filter-multi-select", + w=2, h=2, x=0, y=0, extra_datasets=None): + """ + extra_datasets: list of additional dataset names to apply the filter to. + If None or empty, uses simple single-dataset pattern (no associativity). + If provided, uses multi-dataset pattern with associativity fields. + CRITICAL: every dataset that uses this filter field must be in this list. + """ + all_datasets = [primary_ds] + (extra_datasets or []) + if len(all_datasets) == 1: + # Single dataset: simple pattern, queryName = "main_query" + queries = [{"name": "main_query", "query": { + "datasetName": primary_ds, "disaggregated": False, + "fields": [{"name": field_name, "expression": f"`{field_name}`"}]}}] + enc_fields = [{"fieldName": field_name, "displayName": title, "queryName": "main_query"}] + else: + # Multi-dataset: unique query names + associativity field per dataset + queries = [] + enc_fields = [] + for ds in all_datasets: + qname = f"q_{ds}_{field_name}" + queries.append({"name": qname, "query": { + "datasetName": ds, "disaggregated": False, + "fields": [ + {"name": field_name, "expression": f"`{field_name}`"}, + {"name": f"{field_name}_associativity", + "expression": f"COUNT_IF(`associative_filter_predicate_group`)"} + ]}}) + enc_fields.append({"fieldName": field_name, "queryName": qname}) + return { + "position": {"x": x, "y": y, "width": w, "height": h}, + "widget": { + "name": name, + "queries": queries, + "spec": { + "version": 2, "widgetType": ftype, + "encodings": {"fields": enc_fields}, + "frame": {"showTitle": True, "title": title} + } + } + } +``` + +--- + +#### Global Filter Example + +```json +{ + "name": "filters", + "displayName": "Filters", + "pageType": "PAGE_TYPE_GLOBAL_FILTERS", + "layout": [ + // Filter applying to ALL datasets that have the 'region' column + // The widget must list every such dataset in its queries array + ] +} +``` + --- #### Page-Level Filter Example -Place directly on a canvas page (affects only that page): +Place directly on a canvas page (affects only that page). Still must list all datasets on that page: ```json { @@ -484,40 +595,33 @@ Place directly on a canvas page (affects only that page): "displayName": "Platform Breakdown", "pageType": "PAGE_TYPE_CANVAS", "layout": [ - { - "widget": { - "name": "page-title", - "multilineTextboxSpec": {"lines": ["## Platform Breakdown"]} - }, - "position": {"x": 0, "y": 0, "width": 4, "height": 1} - }, { "widget": { "name": "filter_platform", - "queries": [{ - "name": "ds_platform", - "query": { - "datasetName": "platform_data", - "fields": [{"name": "platform", "expression": "`platform`"}], - "disaggregated": false + "queries": [ + { + "name": "q_ds_platform_platform", + "query": { + "datasetName": "ds_platform", + "fields": [ + {"name": "platform", "expression": "`platform`"}, + {"name": "platform_associativity", "expression": "COUNT_IF(`associative_filter_predicate_group`)"} + ], + "disaggregated": false + } } - }], + ], "spec": { "version": 2, "widgetType": "filter-multi-select", "encodings": { - "fields": [{ - "fieldName": "platform", - "displayName": "Platform", - "queryName": "ds_platform" - }] + "fields": [{"fieldName": "platform", "queryName": "q_ds_platform_platform"}] }, "frame": {"showTitle": true, "title": "Platform"} } }, "position": {"x": 4, "y": 0, "width": 2, "height": 2} } - // ... other widgets on this page ] } ``` @@ -538,10 +642,17 @@ Before deploying, verify: 4. Chart dimensions have ≤8 distinct values 5. All widget fieldNames match dataset columns exactly 6. **Field `name` in query.fields matches `fieldName` in encodings exactly** (e.g., both `"sum(spend)"`) -7. Counter datasets: use `disaggregated: true` for 1-row datasets, `disaggregated: false` with aggregation for multi-row -8. Percent values are 0-1 (not 0-100) -9. SQL uses Spark syntax (date_sub, not INTERVAL) -10. **All SQL queries tested via `execute_sql` and return expected data** +7. **`disaggregated` rules (CRITICAL)**: + - **Charts (bar/line/pie)**: ALWAYS `disaggregated: false` + explicit aggregation (`SUM`, `AVG`, `COUNT`) in the y-axis field expression. Using `disaggregated: true` on charts causes `[MISSING_GROUP_BY] SQLSTATE: 42803`. + - **Counters with multi-row datasets**: `disaggregated: false` + aggregation expression + - **Counters with 1-row pre-aggregated datasets**: `disaggregated: true` + simple field reference + - **Tables**: `disaggregated: true` + simple field references (raw row display) + - **Filters**: `disaggregated: false` + simple field reference (for DISTINCT values) +8. `widgetType` and `frame` ALWAYS go inside the `spec` block — never at the widget root +9. Text widgets: use `multilineTextboxSpec: {lines: [...]}` at widget root — NO `spec` block at all +10. Percent values are 0-1 (not 0-100) +11. SQL uses Spark syntax (date_sub, not INTERVAL) +12. **All SQL queries tested via `execute_sql` and return expected data** --- @@ -678,7 +789,7 @@ dashboard = { }, "position": {"x": 4, "y": 2, "width": 2, "height": 3} }, - # Bar chart - version 3 + # Bar chart - version 3, ALWAYS disaggregated=False + explicit SUM/COUNT { "widget": { "name": "trips-by-zip", @@ -688,9 +799,11 @@ dashboard = { "datasetName": "by_zip", "fields": [ {"name": "pickup_zip", "expression": "`pickup_zip`"}, - {"name": "trip_count", "expression": "`trip_count`"} + # CRITICAL: use SUM/COUNT with disaggregated=False, NOT disaggregated=True with raw field + # Using disaggregated=True on charts causes [MISSING_GROUP_BY] SQLSTATE: 42803 + {"name": "sum(trip_count)", "expression": "SUM(`trip_count`)"} ], - "disaggregated": True + "disaggregated": False # ALWAYS False for charts } }], "spec": { @@ -698,7 +811,8 @@ dashboard = { "widgetType": "bar", "encodings": { "x": {"fieldName": "pickup_zip", "scale": {"type": "categorical"}, "displayName": "ZIP"}, - "y": {"fieldName": "trip_count", "scale": {"type": "quantitative"}, "displayName": "Trips"} + # fieldName must match exactly the "name" in fields above + "y": {"fieldName": "sum(trip_count)", "scale": {"type": "quantitative"}, "displayName": "Trips"} }, "frame": {"title": "Trips by Pickup ZIP", "showTitle": True} } @@ -884,10 +998,15 @@ print(result["url"]) - Use `version: 2` (NOT 3) - Ensure dataset returns exactly 1 row -### Dashboard shows empty widgets +### Dashboard shows empty widgets or [MISSING_GROUP_BY] SQLSTATE: 42803 - Run the dataset SQL query directly to check data exists - Verify column aliases match widget field expressions -- Check `disaggregated` flag (should be `true` for pre-aggregated data) +- Check `disaggregated` flag per widget type: + - **Charts (bar/line/pie)**: MUST be `false` + explicit aggregation in y-field (`SUM(...)`, etc.). `true` causes `[MISSING_GROUP_BY]` error. + - **Tables**: MUST be `true` (raw row display) + - **Counters (multi-row dataset)**: `false` + aggregation + - **Counters (1-row dataset)**: `true` + simple reference +- Verify `widgetType` and `frame` are inside `spec`, NOT at the widget root level ### Layout has gaps - Ensure each row sums to width=6 diff --git a/databricks-skills/databricks-genie/SKILL.md b/databricks-skills/databricks-genie/SKILL.md index 3f08628c..05b28425 100644 --- a/databricks-skills/databricks-genie/SKILL.md +++ b/databricks-skills/databricks-genie/SKILL.md @@ -126,3 +126,29 @@ Use these skills in sequence: - **[databricks-synthetic-data-generation](../databricks-synthetic-data-generation/SKILL.md)** - Generate raw parquet data to populate tables for Genie - **[databricks-spark-declarative-pipelines](../databricks-spark-declarative-pipelines/SKILL.md)** - Build bronze/silver/gold tables consumed by Genie Spaces - **[databricks-unity-catalog](../databricks-unity-catalog/SKILL.md)** - Manage the catalogs, schemas, and tables Genie queries + +--- + +## Advanced: Full Genie Configuration via serialized_space + +The `create_or_update_genie` tool only sets basic fields (title, description, tables, sample questions). To populate **all** Genie UI sections, use `PATCH /api/2.0/genie/spaces/{id}` with the `serialized_space` field. + +### Sections that require serialized_space + +| UI Section | serialized_space field | +|---|---| +| General Instructions | `instructions.text_instructions` (max 1 item) | +| SQL queries & functions | `instructions.example_question_sqls` | +| Common SQL Expressions | `instructions.sql_snippets.{filters,expressions,measures}` | +| Benchmarks | `benchmarks.questions` | + +### Workflow for full configuration + +``` +1. create_or_update_genie(...) → creates space with tables + sample_questions +2. GET /api/2.0/genie/spaces/{id}?include_serialized_space=true → fetch existing data +3. Build serialized_space dict with all sections +4. PATCH /api/2.0/genie/spaces/{id} → apply with {"serialized_space": json.dumps(ss)} +``` + +See [spaces.md](spaces.md#advanced-serialized_space-api) for the complete code example, JSON schema, and constraints. diff --git a/databricks-skills/databricks-genie/spaces.md b/databricks-skills/databricks-genie/spaces.md index 8549d6bd..e3ab61e6 100644 --- a/databricks-skills/databricks-genie/spaces.md +++ b/databricks-skills/databricks-genie/spaces.md @@ -201,3 +201,252 @@ The tool finds the existing space by name and updates it. - Add table and column comments - Include sample questions that demonstrate the vocabulary - Add instructions via the Databricks Genie UI + +--- + +## Advanced: serialized_space API + +The `create_or_update_genie` tool only populates the basic sections. To fully configure a Genie Space (General Instructions, SQL Queries & Functions, Common SQL Expressions, Benchmarks), use the `serialized_space` API directly. + +### API Endpoints + +| Method | Endpoint | Purpose | +|--------|----------|---------| +| `GET` | `/api/2.0/genie/spaces/{id}?include_serialized_space=true` | Full space with serialized_space JSON | +| `GET` | `/api/2.0/genie/spaces/{id}` | Basic space (no serialized_space) | +| `PATCH` | `/api/2.0/genie/spaces/{id}` | Update serialized_space (**this works**) | +| `PATCH` | `/api/2.0/data-rooms/{id}` | Update description/warehouse/tables only (NOT serialized_space) | + +> **Critical**: `PATCH /api/2.0/data-rooms/{id}` silently empties `table_identifiers` if not re-sent. Always include all table identifiers in every call. +> +> `PUT /api/2.0/genie/spaces/{id}` does NOT exist (returns 404). + +### serialized_space JSON Schema + +```json +{ + "version": 2, + "config": { + "sample_questions": [{"id": "hex32chars", "question": ["question text"]}] + }, + "data_sources": { + "tables": [{"identifier": "catalog.schema.table", "description": ["optional context"]}] + }, + "instructions": { + "text_instructions": [{"id": "hex32chars", "content": ["Combined instructions string"]}], + "example_question_sqls": [{"id": "hex32chars", "question": ["Question?"], "sql": ["SELECT ..."]}], + "join_specs": [ + { + "id": "hex32chars", + "left": {"identifier": "catalog.schema.table_a", "alias": "table_a"}, + "right": {"identifier": "catalog.schema.table_b", "alias": "table_b"}, + "sql": [ + "`table_a`.`key_col` = `table_b`.`key_col`", + "--rt=FROM_RELATIONSHIP_TYPE_ONE_TO_MANY--" + ], + "instruction": ["Human-readable description of why this join is used"] + } + ], + "sql_snippets": { + "filters": [{"id": "hex32chars", "sql": ["filter_expr"], "display_name": "label"}], + "expressions": [{"id": "hex32chars", "alias": "alias", "sql": ["SQL expr"]}], + "measures": [{"id": "hex32chars", "alias": "measure", "sql": ["AGG expr"]}] + } + }, + "benchmarks": { + "questions": [ + {"id": "hex32chars", "question": ["Business question?"], + "answer": [{"format": "SQL", "content": ["SELECT ..."]}]} + ] + } +} +``` + +### Known Constraints + +- **`text_instructions`**: Maximum **1 item**. Combine all instructions into a single `content` string. +- **All lists must be sorted by `id`** (lexicographic hex sort). The API returns 400 if not sorted. +- **`data_sources.tables` must be sorted by `identifier`** alphabetically. +- **`join_specs`**: Supported. See dedicated section below for exact format. The `sql` field takes **backtick-quoted `alias`.`COL`** syntax (NOT plain `table.column`) and a second element with the cardinality marker. +- IDs must be 32-char hex strings: `uuid.uuid4().hex`. + +### join_specs: Format and Code Example + +The `join_specs` section populates the **Joins** tab in the Genie UI. Each spec defines a relationship between two tables, and Genie uses them to build JOINs automatically. + +#### Exact structure (reverse-engineered from Genie UI) + +```json +{ + "id": "hex32chars", + "left": {"identifier": "catalog.schema.table_a", "alias": "table_a"}, + "right": {"identifier": "catalog.schema.table_b", "alias": "table_b"}, + "sql": [ + "`table_a`.`CLAVE_BODEGA` = `table_b`.`CLAVE_BODEGA` AND `table_a`.`CLAVE_CLIENTE` = `table_b`.`CLAVE_CLIENTE`", + "--rt=FROM_RELATIONSHIP_TYPE_ONE_TO_MANY--" + ], + "instruction": ["Join to get all transactions for each client"] +} +``` + +**Key fields:** + +| Field | Required | Notes | +|---|---|---| +| `left.identifier` | Yes | Full `catalog.schema.table` | +| `left.alias` | Yes | Table name only (last segment after `.`) | +| `right.identifier` / `right.alias` | Yes | Same as left | +| `sql[0]` | Yes | Join condition using **`` `alias`.`COLUMN` ``** format with backticks | +| `sql[1]` | Yes | Cardinality marker string (see below) | +| `instruction` | Yes | List with 1 string: description of the join | + +**Cardinality markers (`sql[1]`):** + +``` +--rt=FROM_RELATIONSHIP_TYPE_ONE_TO_ONE-- +--rt=FROM_RELATIONSHIP_TYPE_ONE_TO_MANY-- +--rt=FROM_RELATIONSHIP_TYPE_MANY_TO_ONE-- +--rt=FROM_RELATIONSHIP_TYPE_MANY_TO_MANY-- +``` + +> **Why it failed before**: We tried plain SQL like `table.col = table.col` — the API requires `` `alias`.`col` `` with backticks, plus the cardinality marker as a second `sql` element. + +#### Helper function + +```python +RT_ONE_TO_ONE = "--rt=FROM_RELATIONSHIP_TYPE_ONE_TO_ONE--" +RT_ONE_TO_MANY = "--rt=FROM_RELATIONSHIP_TYPE_ONE_TO_MANY--" +RT_MANY_TO_ONE = "--rt=FROM_RELATIONSHIP_TYPE_MANY_TO_ONE--" +RT_MANY_TO_MANY = "--rt=FROM_RELATIONSHIP_TYPE_MANY_TO_MANY--" + +def make_join(left_id, right_id, col_pairs, cardinality, instruction): + """ + col_pairs: list of (left_col, right_col) tuples + Uses the short table name (after last dot) as alias. + """ + left_alias = left_id.split(".")[-1] + right_alias = right_id.split(".")[-1] + cond = " AND ".join( + f"`{left_alias}`.`{lc}` = `{right_alias}`.`{rc}`" + for lc, rc in col_pairs + ) + return { + "id": uid(), + "left": {"identifier": left_id, "alias": left_alias}, + "right": {"identifier": right_id, "alias": right_alias}, + "sql": [cond, cardinality], + "instruction": [instruction], + } + +# Example usage: +join_specs = sorted([ + make_join( + "catalog.schema.dim_customer", + "catalog.schema.fact_orders", + [("customer_id", "customer_id")], + RT_ONE_TO_MANY, + "Join to get all orders for each customer" + ), + make_join( + "catalog.schema.fact_orders", + "catalog.schema.fact_order_items", + [("order_id", "order_id")], + RT_ONE_TO_MANY, + "Join to get line items for each order" + ), +], key=lambda x: x["id"]) # MUST be sorted by id +``` + +### How to Apply serialized_space + +```python +import requests, json, uuid + +WORKSPACE_URL = "https://adb-XXXXXXXXXXXXXXXXX.XX.azuredatabricks.net" +TOKEN = "your_token" +SPACE_ID = "your_space_id" +HEADERS = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"} + +def uid(): + return uuid.uuid4().hex + +# 1. Fetch existing data to preserve sample_questions and tables +resp = requests.get( + f"{WORKSPACE_URL}/api/2.0/genie/spaces/{SPACE_ID}?include_serialized_space=true", + headers=HEADERS +) +existing_ss = json.loads(resp.json()["serialized_space"]) + +# 2. Build serialized_space +serialized_space = { + "version": 2, + "config": { + "sample_questions": sorted(existing_ss["config"]["sample_questions"], key=lambda x: x["id"]) + }, + "data_sources": { + "tables": sorted(existing_ss["data_sources"]["tables"], key=lambda t: t["identifier"]) + }, + "instructions": { + # Only 1 text_instructions item allowed — combine all general instructions + "text_instructions": [{"id": uid(), "content": [ + "TERMINOLOGY: 'active customers' = status = 2.\n\n" + "CURRENT STATE: Always filter with MAX(date_col) for snapshots." + ]}], + "example_question_sqls": sorted([ + {"id": uid(), "question": ["How many active customers?"], + "sql": ["SELECT COUNT(*) FROM catalog.schema.customers WHERE status = 2"]} + ], key=lambda x: x["id"]), + "join_specs": sorted([ + make_join( + "catalog.schema.dim_customer", + "catalog.schema.fact_orders", + [("customer_id", "customer_id")], + RT_ONE_TO_MANY, + "Join to get all orders for each customer" + ), + ], key=lambda x: x["id"]), + "sql_snippets": { + "filters": sorted([ + {"id": uid(), "sql": ["status = 2"], "display_name": "active customers"} + ], key=lambda x: x["id"]), + "expressions": sorted([ + {"id": uid(), "alias": "current_date_filter", + "sql": ["(SELECT MAX(date_col) FROM catalog.schema.table)"]} + ], key=lambda x: x["id"]), + "measures": sorted([ + {"id": uid(), "alias": "completion_rate", + "sql": ["ROUND(100.0 * SUM(completed) / COUNT(*), 1)"]} + ], key=lambda x: x["id"]), + } + }, + "benchmarks": { + "questions": sorted([ + {"id": uid(), "question": ["How many active customers?"], + "answer": [{"format": "SQL", "content": [ + "SELECT COUNT(*) FROM catalog.schema.table WHERE status = 2" + ]}]} + ], key=lambda x: x["id"]) + } +} + +# 3. Apply via PATCH (400 if lists not sorted by id) +response = requests.patch( + f"{WORKSPACE_URL}/api/2.0/genie/spaces/{SPACE_ID}", + headers=HEADERS, + json={"serialized_space": json.dumps(serialized_space)} +) +print(response.status_code) # 200 = success +``` + +### UI Section Mapping + +| serialized_space section | Genie UI section | +|---|---| +| `instructions.text_instructions` | General Instructions | +| `instructions.join_specs` | Joins | +| `instructions.example_question_sqls` | SQL queries & functions | +| `instructions.sql_snippets.filters` | Common SQL Expressions > Filters | +| `instructions.sql_snippets.expressions` | Common SQL Expressions > Expressions | +| `instructions.sql_snippets.measures` | Common SQL Expressions > Measures | +| `benchmarks.questions` | Benchmarks | +| `config.sample_questions` | Sample Questions (homepage) |