Skip to content

Conversation

@JoOkuma
Copy link
Member

@JoOkuma JoOkuma commented Jan 20, 2026

Closes #193 and #243

- Add AttrSchema dataclass with defensive copying
- Create process_attr_key_args() helper to eliminate duplication
- Refactor dtype inference and SQLAlchemy mapping to use dictionaries
- Add overloaded method signatures for convenience and schema modes
- Replace _node/edge_attr_keys lists with _attr_schemas dicts
- Add comprehensive test suite (16 tests passing)

Breaking change: dtype parameter now required. Next phase will update
all calling sites throughout the codebase.
…add_edge_attr_key

This is a breaking change that requires dtype (polars.DataType) to be explicitly
specified when adding attribute keys to graphs.

Core changes:
- Add AttrSchema dataclass to store key, dtype, and default_value together
- Add utility functions: infer_default_value_from_dtype, polars_dtype_to_sqlalchemy_type,
  validate_default_value_dtype_compatibility, and process_attr_key_args
- Update BaseGraph abstract methods with overloaded signatures supporting both
  direct parameters and AttrSchema objects
- Replace _node_attr_keys/_edge_attr_keys lists with _node_attr_schemas dicts
  in RustWorkXGraph and SQLGraph
- Remove _boolean_columns tracking in SQLGraph (now uses AttrSchema)
- Fix _polars_schema_override to use == instead of isinstance for pl.Boolean check

Backend implementations:
- RustWorkXGraph: Use process_attr_key_args helper, store AttrSchema objects
- SQLGraph: Use process_attr_key_args helper, pass AttrSchema to _add_new_column
- GraphView: Delegate to root graph with all parameters preserved
- Add dtype inference fallback for complex objects that polars can't parse

Updated all callers:
- Graph methods: from_other(), match()
- Node operators: RandomNodes, RegionPropsNodes, GenericNodesOperator
- Edge operators: DistanceEdges, GenericEdgesOperator
- Solvers: ILPSolver, NearestNeighborsSolver
- IO: numpy_array, ctc
Update all test files to use the new required dtype parameter for
add_node_attr_key and add_edge_attr_key methods.

Key test fixes:
- Add dtype inference logic in test_add_node_attr_key based on value type
- Fix bbox attribute calls to use pl.Array(pl.Int64, size) instead of
  passing numpy arrays as dtype
- Add missing polars imports where needed
- Fix argument order from (key, False, dtype=pl.Boolean) to
  (key, pl.Boolean, default_value=False)
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 91.00529% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.15%. Comparing base (574694d) to head (1cb217f).

Files with missing lines Patch % Lines
src/tracksdata/graph/_sql_graph.py 85.00% 4 Missing and 2 partials ⚠️
src/tracksdata/graph/_graph_view.py 73.33% 2 Missing and 2 partials ⚠️
src/tracksdata/graph/_rustworkx_graph.py 90.24% 3 Missing and 1 partial ⚠️
src/tracksdata/utils/_dtypes.py 96.61% 1 Missing and 1 partial ⚠️
src/tracksdata/nodes/_mask.py 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
- Coverage   88.35%   88.15%   -0.21%     
==========================================
  Files          56       56              
  Lines        4104     4203      +99     
  Branches      714      733      +19     
==========================================
+ Hits         3626     3705      +79     
- Misses        291      311      +20     
  Partials      187      187              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoOkuma
Copy link
Member Author

JoOkuma commented Jan 20, 2026

Hi @TeunHuijben and @yfukai,

I would appreciate it if you both could review this PR, at least the public API, since it involves breaking changes.
Before you do that, I need to give it a pass because a big part of it was generated by an LLM.
I'll tag you again once I'm done.

rx_graph = self.rx_graph
for node_id in rx_graph.node_indices():
rx_graph[node_id][key] = default_value
rx_graph[node_id][schema.key] = schema.default_value
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to add the default_value?
This could be solved when returning the attrs value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think returning the attrs value for unset attribute would be better.

# Add to all existing edges
for edge_attr in self.rx_graph.edges():
edge_attr[key] = default_value
edge_attr[schema.key] = schema.default_value
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@TeunHuijben
Copy link
Contributor

Haven't personally tested the code, but the API looks good!

@JoOkuma
Copy link
Member Author

JoOkuma commented Jan 21, 2026

Hi @yfukai , this should be ready for review.
I'm going to open a separate PR that stops setting the attributes of existing nodes when adding a new key, and it will happen when querying node|edge attributes.

@TeunHuijben
Copy link
Contributor

Hi @JoOkuma and @yfukai, what is the status of this PR, I would really like the default-feature PR to be merged, which depends on this one 😇

@yfukai
Copy link
Contributor

yfukai commented Jan 23, 2026

Hi sorry @TeunHuijben I'm still reviewing it, I'll finish that within this weekend!

@TeunHuijben
Copy link
Contributor

Hi @yfukai, no worries, take your time! Was just wondering whether your thumbs-up meant approval or whether you were reviewing. Thank you! 😊

Copy link
Contributor

@yfukai yfukai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much nicer than what we currently have, thanks @JoOkuma!
I think this PR is good to merge, but

  • It would be even better if the dtype does not change with a round-trip with from_other. For this purpose, maybe we can store the AttrSchemas into a read-only metadata.
  • For RustworkXGraph, maybe we can skip setting the default value to the rx_graph and return the default value for unset attr to save the initialization time. → I found this addressed in the next PR, thanks!

rx_graph = self.rx_graph
for node_id in rx_graph.node_indices():
rx_graph[node_id][key] = default_value
rx_graph[node_id][schema.key] = schema.default_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think returning the attrs value for unset attribute would be better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explicit dtype for the node and edge attrs?

4 participants