Consider flattening table schemas

Currently, all Layercake tables have the following schema:

| column_name | column_type                                            |
|-------------|--------------------------------------------------------|
| type        | VARCHAR                                                |
| id          | BIGINT                                                 |
| tags        | STRUCT(...)                                            |
| bbox        | STRUCT(xmin FLOAT, ymin FLOAT, xmax FLOAT, ymax FLOAT) |
| geometry    | GEOMETRY                                               |

The `tags` field is the only thing that varies between layers; it is a struct and contains all of the layer-specific columns as fields.

**I am considering flattening the contents of the `tags` column.** This would for example mean that the new `settlements` layer (added in #19) would have the following schema:

|  column_name   |                      column_type                       |
|----------------|--------------------------------------------------------|
| type           | VARCHAR                                                |
| id             | BIGINT                                                 |
| place          | VARCHAR                                                |
| name           | VARCHAR                                                |
| names          | MAP(VARCHAR, VARCHAR)                                  |
| alt_name       | VARCHAR                                                |
| alt_names      | MAP(VARCHAR, VARCHAR)                                  |
| official_name  | VARCHAR                                                |
| official_names | MAP(VARCHAR, VARCHAR)                                  |
| wikidata       | VARCHAR                                                |
| wikipedia      | VARCHAR                                                |
| population     | UBIGINT                                                |
| bbox           | STRUCT(xmin FLOAT, ymin FLOAT, xmax FLOAT, ymax FLOAT) |
| geometry       | GEOMETRY                                               |

Advantages:
- Flatter schemas are more familiar to most people (SQL tables, Excel spreadsheets, CSV files, etc are typically flat) and easier to visualize (a two-dimensional data table is easy to show on paper or on a screen).
- Flatter schemas are also more straightforward to convert to other formats (e.g. export from Layercake data to Shapefile or CSV), since column remapping isn't necessary.
- DuckDB's `describe table` and the `gpq describe` output are more useful. Currently, both commands output a table that looks much like the first table above (i.e. most of the interesting bits are hidden).
- Many queries become simpler (e.g. `where name = 'New York'` instead of `where tags.name = 'New York'`).
- Quoting tag names that have special characters (most commonly `:`) is easier. Is it `tags."building:levels"` or `"tags.building:levels"`? No need to remember anymore, just quote the whole name: `"building:levels"`.
- `tags` is a bad name for the struct column since it is no longer a strict 1:1 representation of a feature's OSM tags (see discussion in #7); `columns` or `attributes` would be less wrong, but bland; flattening the struct sidesteps having to name it.

Disadvantages:
- `type`, `id`, `bbox`, and `geometry` columns can potentially collide with tag keys.
- Breaks backwards compatibility for existing Layercake users.

I think the advantages outweigh the disadvantages, but I'd like to hear from current users. Please comment if you have an opinion or alternative suggestions.

Note: This should have no effect on dataset sizes; under the hood struct fields in Parquet files get flattened anyways (each individual field becomes its own column and is stored in a contiguous block within each row group). So this is mainly about ergonomics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider flattening table schemas #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

column_name	column_type
type	VARCHAR
id	BIGINT
tags	STRUCT(...)
bbox	STRUCT(xmin FLOAT, ymin FLOAT, xmax FLOAT, ymax FLOAT)
geometry	GEOMETRY

Uh oh!

Consider flattening table schemas #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions