-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Support Arrow IPC stream files with same public interface #18559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Arrow IPC stream files with same public interface #18559
Conversation
| impl Default for ArrowSource { | ||
| fn default() -> Self { | ||
| Self::default_file_source() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to have some conflicts with #18386 FYI. Happy to help resolve them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, I'm just eager to get this merged. I'm at five separate reviewers so far 😭, I'm just hoping this last pass is enough to clear the hurdle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, sorry for all of the work you've had to go through, but it's all for a better outcome in the end! I will try to review this tomorrow but I have a flight so no promises. For what it's worth though I don't think this is likely to be rushed into the 51 release since it's mostly adding a feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better for it in the end, too, really. And from reading your PR I got the same impression that it wouldn't make it into 51.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and don't worry about the review! Flying in the US looks to be rough enough on its own and getting worse and I absolutely wouldn't want to add to your stress. Safe travels!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge conflicts weren't bad at all, fwiw
|
I'm pulling these changes back into the original PR where there's the most history. Thanks for checking this out @adriangb! |
Another take on #18457 -- this time we're not changing the public interface.
Which issue does this PR close?
register_arrowor similar #16688.Rationale for this change
Currently Datafusion can only read Arrow files if the're in the File format, not the Stream format. I work with a bunch of Stream format files and wanted native support.
What changes are included in this PR?
To accomplish the above, this PR splits the Arrow datasource into two separate implementations (
ArrowStreamFile*andArrowFile*) with a facade on top to differentiate between the formats at query planning time. This maintains the same public interface to maintain backwards compatibility.Are these changes tested?
Yes, there are end-to-end sqllogictests along with tests for the changes within datasource-arrow.
Are there any user-facing changes?
Technically yes, in that we support a new format now. I'm not sure which documentation would need to be updated?