You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tags field is the only thing that varies between layers; it is a struct and contains all of the layer-specific columns as fields.
I am considering flattening the contents of the tags column. This would for example mean that the new settlements layer (added in #19) would have the following schema:
Flatter schemas are more familiar to most people (SQL tables, Excel spreadsheets, CSV files, etc are typically flat) and easier to visualize (a two-dimensional data table is easy to show on paper or on a screen).
Flatter schemas are also more straightforward to convert to other formats (e.g. export from Layercake data to Shapefile or CSV), since column remapping isn't necessary.
DuckDB's describe table and the gpq describe output are more useful. Currently, both commands output a table that looks much like the first table above (i.e. most of the interesting bits are hidden).
Many queries become simpler (e.g. where name = 'New York' instead of where tags.name = 'New York').
Quoting tag names that have special characters (most commonly :) is easier. Is it tags."building:levels" or "tags.building:levels"? No need to remember anymore, just quote the whole name: "building:levels".
tags is a bad name for the struct column since it is no longer a strict 1:1 representation of a feature's OSM tags (see discussion in Support more fine-grained schema definition (non-string types) #7); columns or attributes would be less wrong, but bland; flattening the struct sidesteps having to name it.
Disadvantages:
type, id, bbox, and geometry columns can potentially collide with tag keys.
Breaks backwards compatibility for existing Layercake users.
I think the advantages outweigh the disadvantages, but I'd like to hear from current users. Please comment if you have an opinion or alternative suggestions.
Note: This should have no effect on dataset sizes; under the hood struct fields in Parquet files get flattened anyways (each individual field becomes its own column and is stored in a contiguous block within each row group). So this is mainly about ergonomics.
Currently, all Layercake tables have the following schema:
The
tagsfield is the only thing that varies between layers; it is a struct and contains all of the layer-specific columns as fields.I am considering flattening the contents of the
tagscolumn. This would for example mean that the newsettlementslayer (added in #19) would have the following schema:Advantages:
describe tableand thegpq describeoutput are more useful. Currently, both commands output a table that looks much like the first table above (i.e. most of the interesting bits are hidden).where name = 'New York'instead ofwhere tags.name = 'New York').:) is easier. Is ittags."building:levels"or"tags.building:levels"? No need to remember anymore, just quote the whole name:"building:levels".tagsis a bad name for the struct column since it is no longer a strict 1:1 representation of a feature's OSM tags (see discussion in Support more fine-grained schema definition (non-string types) #7);columnsorattributeswould be less wrong, but bland; flattening the struct sidesteps having to name it.Disadvantages:
type,id,bbox, andgeometrycolumns can potentially collide with tag keys.I think the advantages outweigh the disadvantages, but I'd like to hear from current users. Please comment if you have an opinion or alternative suggestions.
Note: This should have no effect on dataset sizes; under the hood struct fields in Parquet files get flattened anyways (each individual field becomes its own column and is stored in a contiguous block within each row group). So this is mainly about ergonomics.