|
1 | 1 | --- |
2 | 2 | weight: 200 |
3 | 3 | title: "Explore Matches" |
4 | | -description: "Once the files finished loading, you will be presented with the main BDIViz interface." |
| 4 | +description: "Explore schema alignment candidates using the BDIViz interactive heatmap interface." |
5 | 5 | icon: "zoom_in" |
6 | 6 | date: "2025-04-19T13:38:17-04:00" |
7 | | -lastmod: "2025-04-19T13:38:17-04:00" |
| 7 | +lastmod: "2025-04-21T11:00:00-04:00" |
8 | 8 | draft: false |
9 | 9 | toc: true |
10 | 10 | --- |
11 | 11 |
|
12 | | -> Once your files are uploaded, BDIViz will load the full interface for exploring match candidates between your source and target datasets. |
| 12 | +## Overview |
| 13 | + |
| 14 | +After uploading your files, BDIViz launches its main interface, allowing you to visually explore match candidates between your source dataset and the target schema. |
| 15 | + |
| 16 | +--- |
13 | 17 |
|
14 | 18 | ## Interactive Heatmap |
15 | 19 |
|
16 | | -This is the central visual of BDIViz. Each cell in the heatmap represents a match candidate between a source attribute (on the y-axis) and a target attribute (on the x-axis). |
| 20 | +The heatmap is the primary visualization tool in BDIViz. Each cell represents a match candidate between a source attribute (y-axis) and a target attribute (x-axis). |
17 | 21 |
|
18 | 22 |  |
19 | 23 |
|
20 | | - |
21 | | -### Heatmap Components |
| 24 | +### Key Features |
22 | 25 |
|
23 | 26 | {{< tabs tabTotal="4">}} |
24 | | -{{% tab tabName="Heatmap Nodes" %}} |
25 | 27 |
|
26 | | -Each cell in the heatmap represents a match candidate between a source attribute (on the y-axis) and a target attribute (on the x-axis). |
| 28 | +{{% tab tabName="Heatmap Cells" %}} |
27 | 29 |
|
28 | | -- Color intensity represents match strength: darker = higher match score. |
29 | | -- Green indicates accepted matches; red indicates rejected ones. |
30 | | -- Click on a node to inspect further details. |
| 30 | +Each cell corresponds to a potential match: |
31 | 31 |
|
32 | | -{{% /tab %}} |
33 | | -{{% tab tabName="X-Axis" %}} |
| 32 | +- **Color intensity** indicates similarity score (darker = stronger match). |
| 33 | +- **Green** = accepted match, **Red** = rejected match. |
| 34 | +- Click on a cell to inspect match details. |
34 | 35 |
|
35 | | - |
| 36 | +{{% /tab %}} |
36 | 37 |
|
37 | | -The x-axis of the heatmap represents the target schema’s attribute space in a structured, hierarchical layout—specifically designed to support the complexity of biomedical data models like the Genomic Data Commons (GDC). |
| 38 | +{{% tab tabName="X-Axis Hierarchy" %}} |
38 | 39 |
|
39 | | -This hierarchy includes: |
| 40 | + |
40 | 41 |
|
41 | | -- **Categories** (e.g., "clinical," "biospecimen") |
42 | | -- **Nodes** (e.g., "diagnosis," "treatment") |
43 | | -- **Target Attributes** as leaf nodes |
| 42 | +The x-axis displays the target schema's structure as a semantic hierarchy: |
44 | 43 |
|
45 | | -Color coding helps differentiate between categories, while curved connectors visually represent hierarchical relationships. Users can hover over any part of the hierarchy to highlight supercategories, specific categories, or individual columns, aiding navigation and contextual understanding. |
| 44 | +- **Category Level**: e.g., `clinical`, `biospecimen` |
| 45 | +- **Node Level**: e.g., `diagnosis`, `treatment` |
| 46 | +- **Leaf Nodes**: individual target attributes |
46 | 47 |
|
| 48 | +Color and curved connectors help clarify relationships and improve navigation. |
47 | 49 |
|
48 | 50 | {{% /tab %}} |
49 | | -{{% tab tabName="Expandable Node" %}} |
50 | 51 |
|
51 | | -Once you click on a heatmap node, the node expand and shows a stacked heatmap. |
| 52 | +{{% tab tabName="Expandable Nodes" %}} |
52 | 53 |
|
53 | | -- **The upper histogram** shows the distribution of values from the source column. |
| 54 | +Clicking a heatmap node reveals a stacked histogram panel: |
54 | 55 |
|
55 | | -- **The lower histogram** shows the distribution from the target column. |
| 56 | +- **Top Chart**: Source column value distribution |
| 57 | +- **Bottom Chart**: Target column value distribution |
56 | 58 |
|
57 | | -These visuals help assess whether the values in the two columns are statistically or categorically aligned. For example, if both columns share similar categories with comparable frequency distributions, it suggests a strong match. |
| 59 | +Use this to evaluate whether the two columns share meaningful overlap. |
58 | 60 |
|
59 | | -_**Interpretation Tip:** Matching distributions with similar peaks (e.g., a high frequency of "Male" and "Female" values) may indicate semantic alignment. Dissimilar distributions can signal a mismatch or a need for closer inspection._ |
| 61 | +**Tip:** Similar distributions (e.g., shared categories like "Male" and "Female") often suggest semantic alignment. |
60 | 62 |
|
61 | 63 |  |
62 | 64 |
|
63 | 65 | {{% /tab %}} |
64 | | -{{% tab tabName="Top Tabs (above heatmap)" %}} |
65 | 66 |
|
66 | | - |
67 | | - |
68 | | -- **All:** Displays all potential matches. |
| 67 | +{{% tab tabName="Heatmap Top Tabs" %}} |
69 | 68 |
|
70 | | -- **Accepted:** Shows only those that have been manually confirmed. |
| 69 | + |
71 | 70 |
|
72 | | -- **Unmatched:** Lists only those source attributes with no confirmed match. |
| 71 | +The top-level filter tabs help narrow your focus: |
73 | 72 |
|
74 | | -- **Expand On Hover:** _This toggle controls if the expandable node will be expanded on mouse hover or not(on click)._ |
| 73 | +- **All**: View all candidate matches |
| 74 | +- **Accepted**: Only show confirmed matches |
| 75 | +- **Unmatched**: Only show source columns with no confirmed match |
| 76 | +- **Expand on Hover**: Toggle whether expanded histograms appear on hover or click |
75 | 77 |
|
76 | 78 | {{% /tab %}} |
| 79 | + |
77 | 80 | {{< /tabs >}} |
78 | 81 |
|
79 | | -## Filters and Search Tools |
| 82 | +--- |
| 83 | + |
| 84 | +## Filters and Search Controls |
80 | 85 |
|
81 | | -BDIViz provides several ways to customize and narrow your view: |
| 86 | +Fine-tune the heatmap view using a suite of filters: |
82 | 87 |
|
83 | 88 | {{< tabs tabTotal="4">}} |
84 | | -{{% tab tabName="Source Attribute" %}} |
85 | 89 |
|
86 | | -Lets you choose which column to focus on. |
| 90 | +{{% tab tabName="Source Attribute Selector" %}} |
| 91 | + |
| 92 | +Select a specific source column to examine. |
87 | 93 |
|
88 | 94 |  |
89 | 95 |
|
90 | | -Once click you will able to see this dropdown menu showing all the source attributes: |
91 | | -- **All:** Shows all source attributes in a paginatable manner. |
92 | | -- **Attributes in green:** The attributes that already have at least one accpeted candidate. |
93 | | -- **Attributes in grey:** The attributes that is discarded by user. |
| 96 | +Dropdown key: |
| 97 | + |
| 98 | +- **Green**: Already matched |
| 99 | +- **Grey**: Manually discarded |
| 100 | +- **All**: Show all source attributes |
94 | 101 |
|
95 | 102 |  |
96 | 103 |
|
97 | 104 | {{% /tab %}} |
| 105 | + |
98 | 106 | {{% tab tabName="Similarity Threshold" %}} |
99 | 107 |
|
100 | | -Adjusts the minimum score a candidate must meet to appear. Values range from 0 (show all) to 1 (only highest similarity). |
| 108 | +Set a minimum similarity score for visible candidates. Range: `0.0 – 1.0`. |
| 109 | + |
| 110 | +- `0.0`: Show all matches |
| 111 | +- `1.0`: Show only perfect matches |
101 | 112 |
|
102 | 113 | {{% /tab %}} |
103 | | -{{% tab tabName="Similar Attributes" %}} |
104 | 114 |
|
105 | | -Determines how many similar source columns appear in the y-axis for better comparative context. |
| 115 | +{{% tab tabName="Similar Attributes Displayed" %}} |
106 | 116 |
|
107 | | -**Note:** This only apply when **Source Attribute** is not set to **All**. |
| 117 | +Controls how many similar source attributes appear on the heatmap when a single source column is selected. |
108 | 118 |
|
| 119 | +> Note: Only applies when Source Attribute ≠ "All". |
109 | 120 |
|
110 | 121 | {{% /tab %}} |
111 | | -{{% tab tabName="Search Bar" %}} |
112 | 122 |
|
113 | | -Lets you highlight specific target attributes by name or keyword. |
| 123 | +{{% tab tabName="Search Bar" %}} |
114 | 124 |
|
| 125 | +Quickly locate and highlight target attributes by name or keyword. |
115 | 126 |
|
116 | 127 | {{% /tab %}} |
| 128 | + |
117 | 129 | {{< /tabs >}} |
118 | 130 |
|
| 131 | +--- |
119 | 132 |
|
120 | | -## Lower Panel Visuals |
| 133 | +## Lower Panel: Match Details |
121 | 134 |
|
122 | | -When a node is selected from the heatmap, the bottom panels expand to provide deeper insights: |
| 135 | +Clicking on any heatmap node reveals deeper insights below: |
123 | 136 |
|
124 | 137 | {{< tabs tabTotal="2">}} |
| 138 | + |
125 | 139 | {{% tab tabName="Value Comparisons" %}} |
126 | 140 |
|
127 | 141 |  |
128 | 142 |
|
129 | | -Visual representation of fuzzy-matched value pairs between the selected source and target columns. |
130 | | - |
131 | | -Each row shows a unique source value and its closest string-matched counterpart(s) from the target. |
| 143 | +Visualizes string-based fuzzy matching between source and target values. |
132 | 144 |
|
133 | | -Helps validate or question whether two attributes should be considered a match. |
| 145 | +- Each row: one source value + its closest matches |
| 146 | +- Use to validate whether mapping is semantically and syntactically justified |
134 | 147 |
|
135 | 148 | {{% /tab %}} |
| 149 | + |
136 | 150 | {{% tab tabName="UpSet Plot" %}} |
137 | 151 |
|
138 | 152 |  |
139 | 153 |
|
140 | | -the UpSet Plot provides a detailed breakdown of how each individual matcher contributed to the overall score of the selected candidate. |
| 154 | +Visualizes how different matchers contributed to the candidate match score. |
141 | 155 |
|
142 | | -Each matcher is represented as a row, and each column represents a candidate match. A dot is shown where a matcher supports a given match. |
| 156 | +- Each row: a matcher |
| 157 | +- Each column: a candidate |
| 158 | +- Dots indicate support from that matcher |
143 | 159 |
|
144 | 160 | {{% /tab %}} |
| 161 | + |
145 | 162 | {{< /tabs >}} |
146 | 163 |
|
| 164 | +--- |
| 165 | + |
147 | 166 | ## LLM Agent Panel |
148 | 167 |
|
149 | 168 |  |
150 | 169 |
|
151 | | -- **Overview Diagnosis:** Summarizes whether the current match is likely valid or not, based on metadata, column names, and prior interactions. |
| 170 | +An embedded LLM-powered assistant provides contextual insights: |
152 | 171 |
|
153 | | -- **Explanation Cards:** |
| 172 | +- **Diagnosis Summary**: Determines if a match is likely valid based on metadata, column names, and user history |
| 173 | +- **Explanation Cards**: Highlight key reasons (e.g., name similarity, value overlap) |
| 174 | + - Click to expand for more detail |
| 175 | + - Provide feedback via 👍/👎 to refine model accuracy |
| 176 | +- **Target Schema Metadata**: Enriched descriptions pulled from sources such as GDC |
154 | 177 |
|
155 | | - - Each card explains one rationale (e.g., shared terms, similar distributions, matching patterns). |
| 178 | +--- |
156 | 179 |
|
157 | | - - Click to expand each explanation. |
| 180 | +## What’s Next? |
158 | 181 |
|
159 | | - - Feedback Buttons: Let you provide a **thumbs-up** or **thumbs-down** to help improve the model’s future reasoning. |
| 182 | +After reviewing the matches: |
160 | 183 |
|
161 | | -- **Target Schema Descriptions:** Additional metadata from sources like GDC are displayed for further context. |
| 184 | +- Accept or reject individual match suggestions |
| 185 | +- Use filtering to prioritize high-confidence matches |
| 186 | +- Proceed to export, refine, or apply your matched schema for downstream harmonization tasks |
0 commit comments