Skip to content

Commit 0459c83

Browse files
Copilotjd-laragithub-actions[bot]daniel-thom
authored
Document associations database schemas for time series and supplemental attributes (#508)
* Initial plan * Add associations database documentation page Co-authored-by: jd-lara <[email protected]> * formatter * Update docs/src/dev_guide/associations_database.md Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update docs/src/dev_guide/associations_database.md Co-authored-by: Daniel Thom <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: jd-lara <[email protected]> Co-authored-by: Jose Daniel Lara <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Daniel Thom <[email protected]>
1 parent 5eab57e commit 0459c83

File tree

2 files changed

+331
-0
lines changed

2 files changed

+331
-0
lines changed

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ pages = OrderedDict(
2121
"dev_guide/components_and_container.md",
2222
"dev_guide/auto_generation.md",
2323
"dev_guide/time_series.md",
24+
"dev_guide/associations_database.md",
2425
"dev_guide/recorder.md",
2526
"dev_guide/tests.md",
2627
"dev_guide/logging.md",
Lines changed: 330 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,330 @@
1+
# Associations Database Schema
2+
3+
!!! note "For Maintainers and Contributors"
4+
5+
6+
This page documents the internal database schemas used by InfrastructureSystems.jl to manage associations between components and their time series data and supplemental attributes. This information is intended for maintainers and contributors working on the codebase. **End users should not need to interact with these databases directly.**
7+
8+
## Overview
9+
10+
InfrastructureSystems.jl uses SQLite databases to efficiently track associations between:
11+
12+
- **Components** and **Time Series data**
13+
- **Components** and **Supplemental Attributes**
14+
15+
These associations are managed under the hood to enable:
16+
17+
- Fast lookups of time series and attributes attached to components
18+
- Efficient querying and filtering
19+
- Proper lifecycle management (adding, removing, updating references)
20+
- Serialization and deserialization support
21+
22+
The package maintains two separate databases:
23+
24+
1. **Time Series Metadata Store** - tracks time series associations
25+
2. **Supplemental Attribute Associations** - tracks supplemental attribute associations
26+
27+
## Design Rationale
28+
29+
### Why Separate Databases?
30+
31+
Time series metadata and supplemental attribute associations are stored in independent SQLite databases rather than different tables in the same database. This design decision is driven by serialization requirements:
32+
33+
**Background:**
34+
35+
- Time series metadata is always persisted as a SQLite file during serialization
36+
- The SQLite file is written as an HDF5 dataset in the time series data file
37+
- Serialization produces: `system.json`, `system_metadata.json`, and `system_time_series.h5`
38+
- If there is no time series in the system, only `system.json` and `system_metadata.json` are produced
39+
40+
**The Problem:**
41+
If supplemental attribute associations were in the same database as time series metadata, and the system had supplemental attributes but no time series, serialization would produce an extra file. The team required that supplemental attribute associations be written to the system JSON file when there is no time series data.
42+
43+
**The Solution:**
44+
Keeping them as separate databases simplifies the code by avoiding the complexity of temporarily sharing a database across serialization and deepcopy operations. The supplemental attribute database is always ephemeral (in-memory only), while the time series metadata can be persisted.
45+
46+
## Time Series Metadata Store
47+
48+
The `TimeSeriesMetadataStore` manages associations between time series data and components/supplemental attributes. It uses an in-memory SQLite database for fast access.
49+
50+
### Database Tables
51+
52+
#### 1. `time_series_associations` Table
53+
54+
This is the primary table that stores the associations between time series data and owners (components or supplemental attributes).
55+
56+
**Schema:**
57+
58+
| Column Name | Type | Description |
59+
|:--------------------------- |:------- |:------------------------------------------------------------------------ |
60+
| `id` | INTEGER | Primary key, auto-incremented |
61+
| `time_series_uuid` | TEXT | UUID of the time series data array |
62+
| `time_series_type` | TEXT | Type name of the time series (e.g., "SingleTimeSeries", "Deterministic") |
63+
| `initial_timestamp` | TEXT | ISO 8601 formatted initial timestamp |
64+
| `resolution` | TEXT | Resolution encoded as [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) |
65+
| `horizon` | TEXT | ISO 8601 formatted forecast horizon (NULL for static time series) |
66+
| `interval` | TEXT | ISO 8601 formatted forecast interval (NULL for static time series) |
67+
| `window_count` | INTEGER | Number of forecast windows (NULL for static time series) |
68+
| `length` | INTEGER | Length of static time series (NULL for forecasts) |
69+
| `name` | TEXT | User-defined name for the time series |
70+
| `owner_uuid` | TEXT | UUID of the component or supplemental attribute that owns this |
71+
| `owner_type` | TEXT | Type name of the owner |
72+
| `owner_category` | TEXT | Either "Component" or "SupplementalAttribute" |
73+
| `features` | TEXT | JSON string of feature key-value pairs for filtering |
74+
| `scaling_factor_multiplier` | JSON | Optional function for scaling (NULL if not used) |
75+
| `metadata_uuid` | TEXT | UUID of the metadata object |
76+
| `units` | TEXT | Optional units specification (NULL if not used) |
77+
78+
**Indexes:**
79+
80+
- `by_c_n_tst_features`: Composite index on `(owner_uuid, time_series_type, name, resolution, features)` - optimized for lookups by component with specific time series parameters
81+
- `by_ts_uuid`: Index on `(time_series_uuid)` - optimized for finding all owners of a specific time series
82+
83+
**Design Notes:**
84+
85+
- The table supports both static time series and forecasts. Forecast-specific columns (`horizon`, `interval`, `window_count`) are NULL for static time series.
86+
- The `features` column stores a JSON string of key-value pairs that can be used for flexible filtering and querying.
87+
- All `Dates.Period` values are stored as ISO 8601 strings for portability.
88+
- The `metadata_uuid` allows multiple associations to reference the same metadata object (stored in memory).
89+
90+
#### 2. `key_value_store` Table
91+
92+
Stores metadata about the database itself.
93+
94+
**Schema:**
95+
96+
| Column Name | Type | Description |
97+
|:----------- |:---- |:----------- |
98+
| `key` | TEXT | Primary key |
99+
| `value` | JSON | JSON value |
100+
101+
**Current Keys:**
102+
103+
- `version`: Stores the time series metadata format version (currently "1.0.0")
104+
105+
### Common Queries
106+
107+
The following types of queries are optimized by the indexes:
108+
109+
1. **Find all time series for a component:**
110+
111+
```sql
112+
SELECT * FROM time_series_associations WHERE owner_uuid = ?
113+
```
114+
115+
2. **Find specific time series by name and type:**
116+
117+
```sql
118+
SELECT * FROM time_series_associations
119+
WHERE owner_uuid = ? AND name = ? AND time_series_type = ?
120+
```
121+
3. **Find time series with specific features:**
122+
123+
```sql
124+
SELECT * FROM time_series_associations
125+
WHERE owner_uuid = ? AND features LIKE ?
126+
```
127+
4. **Find all owners of a time series:**
128+
129+
```sql
130+
SELECT DISTINCT owner_uuid FROM time_series_associations
131+
WHERE time_series_uuid = ?
132+
```
133+
134+
### Migrations
135+
136+
The database schema has evolved over time. Migration code handles upgrading from older formats:
137+
138+
- **v2.3 Migration**: Converted from a single metadata table with JSON columns to the current two-table structure
139+
- **v2.4 Migration**: Converted period storage from integer milliseconds to ISO 8601 strings
140+
141+
Migration functions (`_migrate_from_v2_3`, `_migrate_from_v2_4`) are maintained in `time_series_metadata_store.jl` for backward compatibility.
142+
143+
## Supplemental Attribute Associations
144+
145+
The `SupplementalAttributeAssociations` manages associations between supplemental attributes and components. It uses an in-memory SQLite database that is always ephemeral.
146+
147+
### Database Table
148+
149+
#### `supplemental_attributes` Table
150+
151+
**Schema:**
152+
153+
| Column Name | Type | Description |
154+
|:---------------- |:---- |:--------------------------------------- |
155+
| `attribute_uuid` | TEXT | UUID of the supplemental attribute |
156+
| `attribute_type` | TEXT | Type name of the supplemental attribute |
157+
| `component_uuid` | TEXT | UUID of the component |
158+
| `component_type` | TEXT | Type name of the component |
159+
160+
**Indexes:**
161+
162+
- `by_attribute`: Composite index on `(attribute_uuid, component_uuid, component_type)` - optimized for finding components associated with an attribute
163+
- `by_component`: Composite index on `(component_uuid, attribute_uuid, attribute_type)` - optimized for finding attributes associated with a component
164+
165+
**Design Notes:**
166+
167+
- The schema is simpler than the time series associations because supplemental attributes have less metadata
168+
- Both attribute and component information is stored to enable bidirectional lookups
169+
- The indexes support fast queries in both directions (attribute → components and component → attributes)
170+
171+
### Common Queries
172+
173+
1. **Find all attributes for a component:**
174+
175+
```sql
176+
SELECT DISTINCT attribute_uuid FROM supplemental_attributes
177+
WHERE component_uuid = ?
178+
```
179+
180+
2. **Find attributes of a specific type for a component:**
181+
182+
```sql
183+
SELECT DISTINCT attribute_uuid FROM supplemental_attributes
184+
WHERE component_uuid = ? AND attribute_type = ?
185+
```
186+
3. **Find all components with an attribute:**
187+
188+
```sql
189+
SELECT DISTINCT component_uuid FROM supplemental_attributes
190+
WHERE attribute_uuid = ?
191+
```
192+
4. **Check if an association exists:**
193+
194+
```sql
195+
SELECT attribute_uuid FROM supplemental_attributes
196+
WHERE attribute_uuid = ? AND component_uuid = ?
197+
LIMIT 1
198+
```
199+
200+
## Performance Considerations
201+
202+
### Statement Caching
203+
204+
Both database implementations cache compiled SQL statements to avoid the overhead of re-parsing queries. This saves approximately 3-4 microseconds per query.
205+
206+
- `TimeSeriesMetadataStore` maintains a `cached_statements` dictionary
207+
- `SupplementalAttributeAssociations` maintains a `cached_statements` dictionary
208+
- Frequently-used queries benefit most from caching
209+
210+
### Index Strategy
211+
212+
**Time Series Metadata:**
213+
214+
1. Optimize for user queries by component/attribute UUID with name, type, and resolution
215+
2. Optimize for deduplication checks during `add_time_series!`
216+
3. Optimize for metadata retrieval by time series UUID
217+
218+
**Supplemental Attributes:**
219+
220+
1. Optimize for bidirectional lookups (attribute ↔ component)
221+
2. Support filtering by type in both directions
222+
223+
### Database Location
224+
225+
- Both databases are in-memory (`SQLite.DB()`) for performance
226+
- The time series metadata database can be backed up to disk for serialization
227+
- The supplemental attribute database is never persisted (associations are stored in JSON during serialization)
228+
229+
## Serialization Behavior
230+
231+
### Time Series Metadata
232+
233+
During serialization:
234+
235+
1. The in-memory database is backed up to a temporary file
236+
2. Indexes are dropped from the backup (to reduce file size)
237+
3. The database file is written as an HDF5 dataset in `system_time_series.h5`
238+
239+
During deserialization:
240+
241+
1. The SQLite database is extracted from the HDF5 file
242+
2. It's loaded into an in-memory database
243+
3. Indexes are recreated for performance
244+
4. Metadata objects are reconstructed and cached in memory
245+
246+
### Supplemental Attribute Associations
247+
248+
During serialization:
249+
250+
1. All associations are extracted as records (tuples of UUIDs and types)
251+
2. Records are written to the JSON file
252+
253+
During deserialization:
254+
255+
1. Records are read from the JSON file
256+
2. A new in-memory database is created
257+
3. Records are bulk-inserted using `executemany` for efficiency
258+
4. Indexes are created
259+
260+
## Implementation Files
261+
262+
- **Time Series Metadata Store**: [`src/time_series_metadata_store.jl`](https://github.com/NREL-Sienna/InfrastructureSystems.jl/blob/main/src/time_series_metadata_store.jl)
263+
- **Supplemental Attribute Associations**: [`src/supplemental_attribute_associations.jl`](https://github.com/NREL-Sienna/InfrastructureSystems.jl/blob/main/src/supplemental_attribute_associations.jl)
264+
- **SQLite Utilities**: [`src/utils/sqlite.jl`](https://github.com/NREL-Sienna/InfrastructureSystems.jl/blob/main/src/utils/sqlite.jl)
265+
266+
## Debugging and Inspection
267+
268+
### Querying the Databases
269+
270+
Both stores provide a `sql()` function for running custom queries:
271+
272+
```julia
273+
# Query time series associations
274+
df = InfrastructureSystems.sql(
275+
store,
276+
"SELECT * FROM time_series_associations WHERE owner_type = 'Generator'",
277+
)
278+
279+
# Query supplemental attribute associations
280+
df = InfrastructureSystems.sql(
281+
associations,
282+
"SELECT * FROM supplemental_attributes WHERE component_type = 'Bus'",
283+
)
284+
```
285+
286+
### Viewing as DataFrames
287+
288+
```julia
289+
# Time series associations as DataFrame
290+
df = InfrastructureSystems.to_dataframe(store)
291+
292+
# Supplemental attributes as records
293+
records = InfrastructureSystems.to_records(associations)
294+
```
295+
296+
### Summary Functions
297+
298+
Both stores provide summary functions:
299+
300+
```julia
301+
# Time series summaries
302+
counts = InfrastructureSystems.get_time_series_counts(store)
303+
summary_table = InfrastructureSystems.get_forecast_summary_table(store)
304+
305+
# Supplemental attribute summaries
306+
summary_table = InfrastructureSystems.get_attribute_summary_table(associations)
307+
num_attrs = InfrastructureSystems.get_num_attributes(associations)
308+
```
309+
310+
## Best Practices for Developers
311+
312+
1. **Use Transactions**: When making multiple related changes, wrap them in a SQLite transaction for atomicity and performance
313+
314+
2. **Leverage Indexes**: Design queries to take advantage of the existing indexes. Check query plans if performance is a concern.
315+
3. **Cache Statements**: For frequently-executed queries, use the cached statement methods (`_execute_cached`) rather than creating new statements each time
316+
4. **Validate Migrations**: When modifying the schema, ensure migration code is added and tested with data from older versions
317+
5. **Test with Large Datasets**: Performance characteristics can change significantly with large numbers of associations. Test with realistic data sizes.
318+
6. **Handle Edge Cases**: Consider abstract types, subtypes, and empty result sets in query logic
319+
7. **Maintain Consistency**: When adding/removing associations, ensure both the database and any in-memory caches (like `metadata_uuids` in TimeSeriesMetadataStore) are updated together
320+
321+
## Future Considerations
322+
323+
Potential areas for enhancement:
324+
325+
- **Query Optimization**: Profile and optimize hot paths, especially for large systems
326+
- **Schema Versioning**: Maintain a clear versioning strategy as the schema evolves
327+
- **Partial Indexes**: Consider partial indexes for common filtered queries
328+
- **Bulk Operations**: Optimize bulk insert/delete operations for large datasets
329+
- **Foreign Keys**: Currently not used; could add foreign key constraints for data integrity if needed
330+
- **Full-Text Search**: For advanced filtering on text fields like `name` or `features`

0 commit comments

Comments
 (0)