feat: parse Ref from version and version numbers#5584
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Code ReviewP0 - Ambiguity between tags and branches without "/": The Currently Consider either:
P1 - Inconsistency with existing The existing impl From<&str> for Ref {
fn from(reference: &str) -> Self {
Tag(reference.to_string())
}
}But |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Ref
Ref|
I'd +1 the automated code review concern here: how do we differentiate from branches? We can maybe add some additional rules for what is a valid tag. It seems like this might be worth designing the API in an issue first before going into implementation. There any background on this feature? Why did you decide to implement it? |
My current idea is to use logic similar to Git: we do not allow the same name for refs, including branches and tags. Therefore, for any given string, we only need to check whether it is a tag or a branch.
I'm willing to start one if needed
I'm working on duckdb's versioning feature for lance: lance-format/lance-duckdb#88, for example: -- latest version on branch
SELECT * FROM my_lance.default.t AT (VERSION => 'feat')
-- branch with version
SELECT * FROM my_lance.default.t AT (VERSION => 'feat:45')
-- tag
SELECT * FROM my_lance.default.t AT (VERSION => 'v1.0')
SELECT * FROM my_lance.default.t AT (TIMESTAMP => TIMESTAMP '2025-09-22 12:32:43.217')Note that DuckDB does not allow extending the |
|
Linking a similar Spark PR as related discussions. IMO I think requiring the implementation to discern whether a string passed as version is a branch or a tag presents some ambiguity and an unnecessary restriction that you can not have a branch and a tag with the same name -- something we don't enforce today IIUC. The proposal on Spark-side is to extend the table identifier to include a branch. So from your examples, something like: Looking into duckdb statments I would make a similar argument here that our solution to support branches should span further than |
| #[rstest] | ||
| fn test_parse_ref_ok( | ||
| #[values( | ||
| ("0", Ref::VersionNumber(0)), |
There was a problem hiding this comment.
Could we add a ref/ prefix before the branch or tag, so that we can differentiate a version 100 vs a branch/tag "100"?
There was a problem hiding this comment.
Another thought I proposed in Spark is that ref/ could indicate the delimiter is /. That means if we do ref/branch/tag then we know to use / to split branch and tag.
If a branch has a / (which is common), we an use a different delimiter, e.g. ref:feature/abc:100.
Basically the first 3 characters equaling ref indicates its not a version number, and the forth character is used as the delimiter.
What do we think? @Xuanwo @wjones127 @hamersaw
There was a problem hiding this comment.
Sounds great to me! Just a few clarifying questions:
- Do we need to prepend all tags with branches when using
refthen? ex. withref/foois "foo" a branch or a tag (ie. tag "foo" on branch "main")? - How do we handle ambiguity if a branch / version / tag is specified in both the table ID and in the
VERSION AS OFclause? Just error out?
I've been trying to convince myself that it would be more simple (parsing + API) to only support branches in table IDs and versions / tags in VERSION AS OF clauses (within SELECT and CREATE TAG statements), but that's probably a bad idea if we parse things into our existing Ref type (which contains branch, version, and tag already).
I think we definitely don't want to do another level of nesting like If we use the With that, I think it would be convenient to also reserve |
The "main" has been already reversed and it always points to the actual main branch. This behavior is guaranteed by
Totally agree with this approach about the syntax handling. Back to this PR. I'm wondering if introducing complex string conversion rules at this low level is necessary. I've considered this before and have a few concerns:
There are alternative approaches, such as adding dedicated APIs or providing utility functions. But the conversion logic in this PR now looks fine to me. |
| return Tag(reference.to_string()); | ||
| } | ||
|
|
||
| let branch = if branch == MAIN_BRANCH { |
There was a problem hiding this comment.
nit: standardize_branch was designed as the entry point for this scenario
This PR allows users to parse a str into
Ref.For example:
Parts of this PR were drafted with assistance from Codex (with
gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.