Skip to content

Commit 41caf8c

Browse files
committed
feat: update harmonization assistant, retouch
1 parent f777c62 commit 41caf8c

File tree

22 files changed

+213
-166
lines changed

22 files changed

+213
-166
lines changed
2.13 MB
Loading

content/docs/get-started/verify-value-mappings.md

Lines changed: 41 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
weight: 300
33
title: "Verify Value Matching"
4-
description: "Verify the curated source attrbute value to target attribute value mappings"
4+
description: "Verify curated mappings from source attribute values to target attribute values"
55
icon: "match"
66
date: "2025-10-24T17:58:00-00:00"
77
lastmod: "2025-10-24T17:58:00-00:00"
@@ -11,22 +11,22 @@ toc: true
1111

1212
## Installation 🛠️
1313

14-
For system-specific installation instructions:
15-
- AMD64 architecture: Follow the [Linux/Unix installation guide]({{% relref "/docs/installation/install-on-linux" %}})
16-
- ARM64 architecture (Apple Silicon): Follow the [MacOS installation guide]({{% relref "/docs/installation/install-on-macos" %}})
14+
For system-specific installation instructions (**pick ONE** for your **CPU architecture**):
15+
- **AMD64**: follow the [Linux/Unix installation guide]({{% relref "/docs/installation/install-on-linux" %}})
16+
- **ARM64 (Apple Silicon)**: follow the [macOS installation guide]({{% relref "/docs/installation/install-on-macos" %}})
1717

1818
## Overview ✅
1919

20-
This guide walks you through verifying value mappings after schema alignment. You will enable Developer Mode, upload a ground truth CSV for value mapping verification, and use the heatmap and Data Wrangler to confirm results.
20+
This guide walks you through verifying value mappings after schema alignment. You will **enable Developer Mode**, **upload a ground truth CSV** for value mapping verification, and use the **heatmap** and **Data Wrangler** to confirm results.
2121

2222
---
2323

2424
## Start a New Session 🗂️
2525

2626
![new-session](images/new-session.png)
2727

28-
- On the far left of the top navbar, enter a name in the **New session** text input to create a new session.
29-
- The new session inherits data from the default session, so you can proceed without re-uploading.
28+
- On the far left of the top navbar, **enter a name** in the **New session** field to create a new session.
29+
- The new session **inherits data** from the default session, so you can **proceed without re-uploading**.
3030

3131

3232

@@ -51,7 +51,7 @@ For verifying value mappings, prepare the following:
5151

5252
- **Source table (required)**: your raw dataset CSV, e.g., `Cao.csv`.
5353
- **Target table (not needed)**: leave empty to use the built-in **GDC** schema as target.
54-
- **Ground truth CSV (required for verification)**: a file with exactly four columns:
54+
- **Ground truth CSV (required for verification)**: a file with **exactly four columns**:
5555
- `source_attribute`
5656
- `target_attribute`
5757
- `source_value`
@@ -68,61 +68,77 @@ Race,race,White,white
6868
Gender,gender,Female,female
6969
```
7070

71-
After files are ready, click the blue **Start Value Matching** button at the bottom-right to launch the verification task.
71+
After files are ready, click the blue **Start Value Matching** button at the **bottom-right** to launch the verification task.
7272

7373
---
7474

7575
## Review Schema Matches on the Heatmap 🔥
7676

77-
- The heatmap displays candidate matches between source (y-axis) and GDC target attributes (x-axis). Ground-truth schema pairs included in your CSV should appear as heatmap cells.
78-
- **Hover** a cell to preview details; **click** to expand the embedded node with distributions and comparisons.
79-
- Use the control panel filters to narrow candidates by similarity, attribute, or status.
77+
- The heatmap displays candidate matches between **source (y-axis)** and **GDC target attributes (x-axis)**. Ground-truth schema pairs included in your CSV should appear as heatmap cells.
78+
- **Hover** a cell to preview details; **click** to expand the embedded node with **distributions** and **comparisons**.
79+
- Use the control panel **filters** to narrow candidates by **similarity**, **attribute**, or **status**.
8080

8181
![interactive-heatmap](images/interactive-heatmap.gif)
8282

83-
> Need a refresher on filters and controls? See {{% relref "/docs/manual/explore-matches" %}}.
83+
> Need a refresher on filters and controls? See [explore-matches]({{% relref "/docs/manual/explore-matches" %}}).
8484
8585
---
8686

8787
## Verify in the Data Wrangler 📊
8888

8989
When you click a heatmap node:
9090

91-
- The lower **DATA WRANGLER** view focuses on the selected source attribute (highlighted with a blue background).
92-
- The matched target attribute appears appended to the right of the source attribute.
93-
- If a value mapping is provided in your ground truth, the target column shows the mapped target values applied to each source value.
94-
- If no value mapping is provided for a source value, the corresponding cell in the target column will be empty.
91+
- The lower **DATA WRANGLER** view focuses on the selected **source attribute** (highlighted with a blue background).
92+
- The matched **target attribute** appears appended to the right of the source attribute.
93+
- If a value mapping is provided in your ground truth, the **target column** shows the **mapped target values** applied to each source value.
94+
- If no value mapping is provided for a source value, the corresponding cell in the **target column** will be **empty**.
9595

96-
Use this view to scan for mismatches, missing mappings, or unexpected blanks. Adjust your ground truth and rerun if needed.
96+
Use this view to scan for **mismatches**, **missing mappings**, or **unexpected blanks**. Adjust your ground truth and **rerun** if needed.
9797

9898
![data-wrangler](images/data-wrangler.gif)
9999

100100
---
101101

102102
## Edit and Correct Mapped Values ✏️
103103

104-
If a mapped value looks incorrect:
104+
If a mapped value looks **incorrect**:
105105

106106
- **Hover the cell** in the mapped target column within the **DATA WRANGLER** table.
107107
- Choose one of the following:
108-
- **Inline edit**: Click the current mapped value to edit directly, then press Enter to apply.
108+
- **Inline edit**: Click the current mapped value to edit directly, then press **Enter** to apply.
109109
![inline-edit](images/data-wrangler-inline-edit.png)
110-
- **Edit via popover**: Click the edit icon on the right side of the cell to open a popover listing all available target schema values as selectable chips. Click a chip to apply it.
110+
- **Edit via popover**: Click the **edit icon** on the right side of the cell to open a popover listing all available **target schema values** as selectable chips. **Click a chip** to apply it.
111111
![inline-edit](images/data-wrangler-popover-edit.png)
112112

113-
**Note:** When you change the mapped value for a specific `source_value`, the update is applied to all rows in this source attribute that share the same `source_value`.
113+
**Note:** When you change the mapped value for a specific `source_value`, the update is applied to **all rows** in this source attribute that share the same `source_value`.
114+
115+
## Use the Harmonization Assistant for Value Mapping Suggestions
116+
117+
Use the **Harmonization Assistant** to quickly **suggest** or **apply** value mappings for both **categorical** and **numerical** columns.
118+
119+
- **Select a heatmap node**, then **right-click** to open the **Harmonization Assistant** panel.
120+
- **Ask the agent** with a clear prompt:
121+
- Example (categorical): "Map source values in `Race` to GDC `race` values."
122+
- Example (numerical): "Suggest bins for `BMI` aligned to GDC categories."
123+
- **Review and apply** the suggested mappings to update the **target column**.
124+
125+
To reset the conversation, click the **second** button in the panel's **top-right** to **clear the chat history**.
126+
127+
See the full assistant guide: {{% relref "/docs/manual/llm-agent" %}}.
128+
129+
![harmonization-assistant-value](images/harmonization-assistant-value.gif)
114130

115131
---
116132

117133
## Export Mapping Results 📤
118134

119-
After applying all changes, export your finalized value mappings:
135+
After applying all changes, **export** your finalized value mappings:
120136

121137
- Open the **Shortcut Panel** (top-left).
122138
- Click **Export Matching Results**.
123-
- In the format options, select **CSV 4-column** to generate a ground truth-like file.
139+
- In the format options, select **CSV 4-column** to generate a **ground truth-like** file.
124140

125-
The exported CSV contains four columns:
141+
The exported CSV contains **four columns**:
126142

127143
- `source_attribute`
128144
- `target_attribute`

public/docs/get-started/dataset-to-gdc/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -918,8 +918,8 @@ <h2 id="further-resources-">Further Resources 📖 <a href="#further-resources-"
918918
id: 4 ,
919919
href: "\/bdi-viz-manual\/docs\/get-started\/verify-value-mappings\/",
920920
title: "Verify Value Matching",
921-
description: "Verify the curated source attrbute value to target attribute value mappings",
922-
content: "Installation 🛠️ linkFor system-specific installation instructions:\nAMD64 architecture: Follow the Linux/Unix installation guide ARM64 architecture (Apple Silicon): Follow the MacOS installation guide Overview ✅ linkThis guide walks you through verifying value mappings after schema alignment. You will enable Developer Mode, upload a ground truth CSV for value mapping verification, and use the heatmap and Data Wrangler to confirm results.\nStart a New Session 🗂️ link On the far left of the top navbar, enter a name in the New session text input to create a new session. The new session inherits data from the default session, so you can proceed without re-uploading. Toggle Developer Mode 🧰 link How to open: Click the arrow icon on the top-left corner to open the side panel, then toggle Developer Mode. What it enables: Upload pre-annotated ground truth for schema matching (attribute pairs) Upload pre-annotated ground truth for value mappings (value-to-value) Upload custom matchers (not covered here) Developer Mode surfaces additional upload slots and debugging panels designed for power users and evaluators.\nPrepare Data and Start the Task 📦 linkFor verifying value mappings, prepare the following:\nSource table (required): your raw dataset CSV, e.g., Cao.csv. Target table (not needed): leave empty to use the built-in GDC schema as target. Ground truth CSV (required for verification): a file with exactly four columns: source_attribute target_attribute source_value target_value Example (subset):\nsource_attribute,target_attribute,source_value,target_value FIGO_stage,figo_stage,IA,Stage IA FIGO_stage,figo_stage,IIIC1,Stage IIIC1 Ethnicity,ethnicity,Hispanic or Latino,hispanic or latino Race,race,White,white Gender,gender,Female,female After files are ready, click the blue Start Value Matching button at the bottom-right to launch the verification task.\nReview Schema Matches on the Heatmap 🔥 link The heatmap displays candidate matches between source (y-axis) and GDC target attributes (x-axis). Ground-truth schema pairs included in your CSV should appear as heatmap cells. Hover a cell to preview details; click to expand the embedded node with distributions and comparisons. Use the control panel filters to narrow candidates by similarity, attribute, or status. Need a refresher on filters and controls? See /bdi-viz-manual/docs/manual/explore-matches/.\nVerify in the Data Wrangler 📊 linkWhen you click a heatmap node:\nThe lower DATA WRANGLER view focuses on the selected source attribute (highlighted with a blue background). The matched target attribute appears appended to the right of the source attribute. If a value mapping is provided in your ground truth, the target column shows the mapped target values applied to each source value. If no value mapping is provided for a source value, the corresponding cell in the target column will be empty. Use this view to scan for mismatches, missing mappings, or unexpected blanks. Adjust your ground truth and rerun if needed.\nEdit and Correct Mapped Values ✏️ linkIf a mapped value looks incorrect:\nHover the cell in the mapped target column within the DATA WRANGLER table. Choose one of the following: Inline edit: Click the current mapped value to edit directly, then press Enter to apply.\nEdit via popover: Click the edit icon on the right side of the cell to open a popover listing all available target schema values as selectable chips. Click a chip to apply it. Note: When you change the mapped value for a specific source_value, the update is applied to all rows in this source attribute that share the same source_value.\nExport Mapping Results 📤 linkAfter applying all changes, export your finalized value mappings:\nOpen the Shortcut Panel (top-left). Click Export Matching Results. In the format options, select CSV 4-column to generate a ground truth-like file. The exported CSV contains four columns:\nsource_attribute target_attribute source_value target_value "
921+
description: "Verify curated mappings from source attribute values to target attribute values",
922+
content: "Installation 🛠️ linkFor system-specific installation instructions (pick ONE for your CPU architecture):\nAMD64: follow the Linux/Unix installation guide ARM64 (Apple Silicon): follow the macOS installation guide Overview ✅ linkThis guide walks you through verifying value mappings after schema alignment. You will enable Developer Mode, upload a ground truth CSV for value mapping verification, and use the heatmap and Data Wrangler to confirm results.\nStart a New Session 🗂️ link On the far left of the top navbar, enter a name in the New session field to create a new session. The new session inherits data from the default session, so you can proceed without re-uploading. Toggle Developer Mode 🧰 link How to open: Click the arrow icon on the top-left corner to open the side panel, then toggle Developer Mode. What it enables: Upload pre-annotated ground truth for schema matching (attribute pairs) Upload pre-annotated ground truth for value mappings (value-to-value) Upload custom matchers (not covered here) Developer Mode surfaces additional upload slots and debugging panels designed for power users and evaluators.\nPrepare Data and Start the Task 📦 linkFor verifying value mappings, prepare the following:\nSource table (required): your raw dataset CSV, e.g., Cao.csv. Target table (not needed): leave empty to use the built-in GDC schema as target. Ground truth CSV (required for verification): a file with exactly four columns: source_attribute target_attribute source_value target_value Example (subset):\nsource_attribute,target_attribute,source_value,target_value FIGO_stage,figo_stage,IA,Stage IA FIGO_stage,figo_stage,IIIC1,Stage IIIC1 Ethnicity,ethnicity,Hispanic or Latino,hispanic or latino Race,race,White,white Gender,gender,Female,female After files are ready, click the blue Start Value Matching button at the bottom-right to launch the verification task.\nReview Schema Matches on the Heatmap 🔥 link The heatmap displays candidate matches between source (y-axis) and GDC target attributes (x-axis). Ground-truth schema pairs included in your CSV should appear as heatmap cells. Hover a cell to preview details; click to expand the embedded node with distributions and comparisons. Use the control panel filters to narrow candidates by similarity, attribute, or status. Need a refresher on filters and controls? See explore-matches.\nVerify in the Data Wrangler 📊 linkWhen you click a heatmap node:\nThe lower DATA WRANGLER view focuses on the selected source attribute (highlighted with a blue background). The matched target attribute appears appended to the right of the source attribute. If a value mapping is provided in your ground truth, the target column shows the mapped target values applied to each source value. If no value mapping is provided for a source value, the corresponding cell in the target column will be empty. Use this view to scan for mismatches, missing mappings, or unexpected blanks. Adjust your ground truth and rerun if needed.\nEdit and Correct Mapped Values ✏️ linkIf a mapped value looks incorrect:\nHover the cell in the mapped target column within the DATA WRANGLER table. Choose one of the following: Inline edit: Click the current mapped value to edit directly, then press Enter to apply.\nEdit via popover: Click the edit icon on the right side of the cell to open a popover listing all available target schema values as selectable chips. Click a chip to apply it. Note: When you change the mapped value for a specific source_value, the update is applied to all rows in this source attribute that share the same source_value.\nUse the Harmonization Assistant for Value Mapping Suggestions linkUse the Harmonization Assistant to quickly suggest or apply value mappings for both categorical and numerical columns.\nSelect a heatmap node, then right-click to open the Harmonization Assistant panel. Ask the agent with a clear prompt: Example (categorical): “Map source values in Race to GDC race values.” Example (numerical): “Suggest bins for BMI aligned to GDC categories.” Review and apply the suggested mappings to update the target column. To reset the conversation, click the second button in the panel’s top-right to clear the chat history.\nSee the full assistant guide: /bdi-viz-manual/docs/manual/llm-agent/.\nExport Mapping Results 📤 linkAfter applying all changes, export your finalized value mappings:\nOpen the Shortcut Panel (top-left). Click Export Matching Results. In the format options, select CSV 4-column to generate a ground truth-like file. The exported CSV contains four columns:\nsource_attribute target_attribute source_value target_value "
923923
}
924924
);
925925
index.add(

0 commit comments

Comments
 (0)