-
Notifications
You must be signed in to change notification settings - Fork 13
fix(compile, ut): some compile/ut issues #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@lxy-9602 PTAL,Thanks |
| [10, 1, 0, "2008-12-28 00:00:00.000123456", null, "dad"], | ||
| [10, 1, 100, "2008-12-28 00:00:00.00012345", "-123.45000", "eat"], | ||
| [10, 1, null, "1899-01-01 00:59:20.001001001", "0.00000", "fat"], | ||
| [10, 1, null, "1899-01-01 01:05:03.001001001", "0.00000", "fat"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unit test runs normally on my local machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will recheck it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems these tests are passed as expected in our local environments and CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i find it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that ChunkedArrayFromJSON uses arrow_vendored::date::current_zone()? I looked at the source code, and it seems to always use the UTC time zone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my tz data version as follows, could you share your environment details?
xx@n31-146-204:~$ apt show tzdata
Package: tzdata
Version: 2025c-3
Priority: required
Section: localization
Maintainer: GNU Libc Maintainers <[email protected]>
Installed-Size: 1,391 kB
Provides: tzdata-forky
Depends: debconf (>= 0.5) | debconf-2.0
Breaks: tzdata-legacy (= 2023c-8), tzdata-legacy (= 2023c-9)
Replaces: tzdata-legacy (= 2023c-8), tzdata-legacy (= 2023c-9)
Homepage: https://www.iana.org/time-zones
Tag: role::app-data, use::timekeeping
Download-Size: 263 kB
APT-Sources: http://mirrors.aliyun.com/debian testing/main amd64 Packages
Description: time zone and daylight-saving time data
This package contains data required for the implementation of
standard local time for many representative locations around the
globe. It is updated periodically to reflect changes made by
political bodies to time zone boundaries, UTC offsets, and
daylight-saving rules.
N: There is 1 additional record. Please use the '-a' switch to see it
xx@n31-146-204:~$ zdump -v Asia/Shanghai | grep 1900
Asia/Shanghai Mon Dec 31 15:54:16 1900 UT = Mon Dec 31 23:59:59 1900 LMT isdst=0 gmtoff=29143
Asia/Shanghai Mon Dec 31 15:54:17 1900 UT = Mon Dec 31 23:54:17 1900 CST isdst=0 gmtoff=28800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
PRETTY_NAME="Ubuntu 24.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.3 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
Package: tzdata
Version: 2025b-0ubuntu0.24.04.1
Priority: important
Section: libs
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: GNU Libc Maintainers <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 1420 kB
Provides: tzdata-trixie
Depends: debconf (>= 0.5) | debconf-2.0
Breaks: tzdata-legacy (= 2023c-8ubuntu1)
Replaces: tzdata-legacy (= 2023c-8ubuntu1)
Homepage: https://www.iana.org/time-zones
Task: cloud-minimal, minimal, server-minimal
Download-Size: 276 kB
APT-Manual-Installed: no
APT-Sources: http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages
Description: time zone and daylight-saving time data
Asia/Shanghai Mon Dec 31 15:54:16 1900 UT = Mon Dec 31 23:59:59 1900 LMT isdst=0 gmtoff=29143
Asia/Shanghai Mon Dec 31 15:54:17 1900 UT = Mon Dec 31 23:54:17 1900 CST isdst=0 gmtoff=28800
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is probably at the ORC layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/apache/orc/blob/main/c%2B%2B/src/ColumnReader.cc#L327
Regarding the timezone handling for ORC here: if the timezones of the reader and writer are different, the final display results will vary. I’ve checked the code and confirmed this is indeed the case.
In the code, for the writer timezone: if no metadata is recorded in the ORC file, the local timezone will be used directly. For the reader timezone, it follows the configuration in the row_reader options – the default is GMT. If the local timezones differ, the displayed results will appear inconsistent. Could you please verify the code? @ChaomingZhangCN @lucasfang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know that ORC reader performs timezone conversion based on the writer's and reader's timezones. However, the writer's timezone is fixed, and all readers use Asia/Shanghai—so the timezone conversion logic should be consistent. It's puzzling to see such a discrepancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
@lxy-9602 @ChaomingZhangCN PTAL, thanks |
|
In that case, the integration test case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (const auto& reader : index_file_readers) { | ||
| if (auto* r = dynamic_cast<BitSliceIndexBitmapFileIndexReader*>(reader.get())) { | ||
| bsi_reader = r; | ||
| } else if (auto* r = dynamic_cast<BitmapFileIndexReader*>(reader.get())) { | ||
| bitmap_reader = r; | ||
| } else if (auto* r = dynamic_cast<BloomFilterFileIndexReader*>(reader.get())) { | ||
| bloom_filter_reader = r; | ||
| } | ||
| } | ||
| ASSERT_TRUE(bsi_reader); | ||
| ASSERT_TRUE(bitmap_reader); | ||
| ASSERT_TRUE(bloom_filter_reader); |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterating through index_file_readers to find the different reader types has a potential issue: if multiple readers of the same type exist, only the last one encountered will be assigned. While this may not be an issue in practice if the implementation guarantees at most one reader of each type, it would be safer to add assertions within the loop to verify that each reader type is encountered at most once. This would make the assumption explicit and catch potential bugs if the behavior changes in the future.
| for (const auto& reader : index_file_readers) { | ||
| if (auto* r = dynamic_cast<BitSliceIndexBitmapFileIndexReader*>(reader.get())) { | ||
| bsi_reader = r; | ||
| } else if (auto* r = dynamic_cast<BitmapFileIndexReader*>(reader.get())) { | ||
| bitmap_reader = r; | ||
| } else if (auto* r = dynamic_cast<BloomFilterFileIndexReader*>(reader.get())) { | ||
| bloom_filter_reader = r; | ||
| } | ||
| { | ||
| ASSERT_OK_AND_ASSIGN( | ||
| auto result, bitmap_reader->VisitIn({Literal(Timestamp(1745542802000l, 0)), | ||
| } | ||
| ASSERT_TRUE(bsi_reader); | ||
| ASSERT_TRUE(bitmap_reader); | ||
| ASSERT_TRUE(bloom_filter_reader); |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterating through index_file_readers to find the different reader types has a potential issue: if multiple readers of the same type exist, only the last one encountered will be assigned. While this may not be an issue in practice if the implementation guarantees at most one reader of each type, it would be safer to add assertions within the loop to verify that each reader type is encountered at most once. This would make the assumption explicit and catch potential bugs if the behavior changes in the future.
| for (const auto& reader : index_file_readers) { | ||
| if (auto* r = dynamic_cast<BitSliceIndexBitmapFileIndexReader*>(reader.get())) { | ||
| bsi_reader = r; | ||
| } else if (auto* r = dynamic_cast<BitmapFileIndexReader*>(reader.get())) { | ||
| bitmap_reader = r; | ||
| } else if (auto* r = dynamic_cast<BloomFilterFileIndexReader*>(reader.get())) { | ||
| bloom_filter_reader = r; | ||
| } | ||
| { | ||
| ASSERT_OK_AND_ASSIGN( | ||
| auto result, bitmap_reader->VisitIn({Literal(Timestamp(1745542802001l, 0)), | ||
| } | ||
| ASSERT_TRUE(bsi_reader); | ||
| ASSERT_TRUE(bitmap_reader); | ||
| ASSERT_TRUE(bloom_filter_reader); |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterating through index_file_readers to find the different reader types has a potential issue: if multiple readers of the same type exist, only the last one encountered will be assigned. While this may not be an issue in practice if the implementation guarantees at most one reader of each type, it would be safer to add assertions within the loop to verify that each reader type is encountered at most once. This would make the assumption explicit and catch potential bugs if the behavior changes in the future.
| for (const auto& reader : index_file_readers) { | ||
| if (auto* r = dynamic_cast<BitSliceIndexBitmapFileIndexReader*>(reader.get())) { | ||
| bsi_reader = r; | ||
| } else if (auto* r = dynamic_cast<BitmapFileIndexReader*>(reader.get())) { | ||
| bitmap_reader = r; | ||
| } else if (auto* r = dynamic_cast<BloomFilterFileIndexReader*>(reader.get())) { | ||
| bloom_filter_reader = r; | ||
| } | ||
| } | ||
| ASSERT_TRUE(bsi_reader); | ||
| ASSERT_TRUE(bitmap_reader); | ||
| ASSERT_TRUE(bloom_filter_reader); |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iterating through index_file_readers to find the different reader types has a potential issue: if multiple readers of the same type exist, only the last one encountered will be assigned. While this may not be an issue in practice if the implementation guarantees at most one reader of each type, it would be safer to add assertions within the loop to verify that each reader type is encountered at most once. This would make the assumption explicit and catch potential bugs if the behavior changes in the future.
I’m building the project with Blade, and I haven’t run the integration tests yet. I’ve already run into some issues with the unit tests, so I’ll look into them further. |
Purpose
Linked issue: close #xxx
background: compile and run under gcc8
1.
unordered_map iterate order is undefined, so make ut not depend on the order of return vector's element order
2.
fix ut expect data, hard coding results is mismatch, reason as follows:
(1) refer: https://github.com/eggert/tz/blob/main/asia#L653
When using the Asia/Shanghai timezone, timestamps prior to 1901 have an additional offset of 5 minutes and 43 seconds
(2) refer: https://github.com/HowardHinnant/date/blob/master/include/date/tz.h
arrow vendored tz lib which direct read /etc/localtime to get timezone and format
Tests
API and Format
Documentation