-
Notifications
You must be signed in to change notification settings - Fork 62
Add more options to df_engine docker build, slim down features for some packages #1540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1540 +/- ##
==========================================
+ Coverage 83.44% 83.53% +0.09%
==========================================
Files 428 428
Lines 119043 119720 +677
==========================================
+ Hits 99338 100014 +676
- Misses 19171 19172 +1
Partials 534 534
🚀 New features to boost your workflow:
|
| azure_identity = { workspace = true, optional = true } | ||
| flate2 = { workspace = true, optional = true } | ||
| reqwest = { workspace = true, optional = true } | ||
| reqwest = { workspace = true, optional = true, default-features = false, features = ["rustls-tls"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments here -
- No need to repeat
[default-features = false]in the crate if the workspace already sets it. - It’s better to let the crate manage its own reqwest features instead of forcing them at the workspace level.
- Pick one of
rustls-tlsorrustls-tls-native-roots. They both pull in their own CA bundles, so using both at once doesn’t help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review!
No need to repeat [default-features = false] in the crate if the workspace already sets it.
Good to know! Removed it in the otap crate.
It’s better to let the crate manage its own reqwest features instead of forcing them at the workspace level.
Makes perfect sense, removed at the workspace level.
Pick one of rustls-tls or rustls-tls-native-roots. They both pull in their own CA bundles, so using both at once doesn’t help.
Good catch, I meant to pick rustls-tls not rustls-tls-native-roots and forgot to update both places. Removed the workspace level feature anyways now, so we just have rustls-tls in the otap crate.
|
One thing to keep in mind: the PR switches the TLS backend from |
Yeah absolutely, my goal here is just to try to get things working in the context of the existing docker build. We actually have a couple of other challenges on the Microsoft side related to crypto with this project, because I started trying to get arrow-rs-object-store to allow for pluggable crypto a few months ago rather than just swap from |
| # RUSTFLAGS can be used to pass argument to rustc e.g. | ||
| # --build-arg RUSTFLAGS="-C target-feature=-gfni" | ||
| ARG RUSTFLAGS | ||
| ENV RUSTFLAGS=${RUSTFLAGS} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that we have the ability to set the rust flags for the docker build!
However, I think the impact of this is that we'll override the default override the default rustflags from ./.cargo/config.toml even if --build-arg RUSTFLAGS... isn't set.
FYI, we set -C target-cpu=native in https://github.com/open-telemetry/otel-arrow/blob/main/rust/otap-dataflow/.cargo/config.toml which is probably why you're seeing the AVX512 instructions which your build machine supports.
You'll see the rust flags getting set in the build if you add a -v to the cargo build step in rust/otap-dataflow/cross-arch-build.sh and run the docker build with --progress plain. On main, we'll see output like this:
#18 15.90 Running `/usr/local/rustup/toolchains/stable-aarch64-unknown-linux-gnu/bin/rustc <...snip...> -C target-cpu=native
On this PR's branch, we don't see -C target-cpu=native.
Without target-cpu=native, we wouldn't get micro-architecture specific instructions in the final binary, which we would probably want for best performance in many cases.
That said, the way we're setting the target-cpu in ./cargo/config.toml is probably not correct if we eventually want to publish otap-dataflow on crates.io (which I think we eventually do #1340). I think this will cause our library compiled w/ native instructions, which users who depend on it might not expect.
One thing we could do is:
- a) remove the rustflags configuration from
./.cargo/config.toml - b) set the default value for the RUSTFLAGS docker ARG to
-C target-cpu=native
Does that sound OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clhain just want to confirm something -- for the performance tests in this repo, we're always using the docker images right? e.g. we're not building using cross-arch-build.sh directly (outside of docker) docker and then just running the resulting binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context, I had no idea we already had default args in .cargo/config.toml!
That totally works for me, but what do you think about keeping the configuration in .cargo/config.toml so that folks will still get best performance by default when building/benchmarking locally and then still add -C target-cpu=native as the default for the docker build as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah good point -- I hadn't thought about folks doing local benchmarking.
OK, sounds good. Let's leave .cargo/config.toml as is for now, and just set -C target-cpu=native as the default for the docker build.
This PR addresses a few problems that I ran into getting
df_enginerunning in my kubernetes environment. The end result of all this is that you can now run a docker build like this:docker build --build-arg FEATURES="azure" --build-arg RUSTFLAGS="-C target-feature=-gfni" --build-arg DEBUG=1 --build-context otel-arrow=../../ -f Dockerfile -t df_engine .TLS
Some of the Azure SDK crates + reqwest have a lot of default features enabled that we are not using and which rely on linking with openssl for TLS. This was giving me trouble with our musl build.
I've disabled default features for these packages at the workspace level and opted for using
rustls-tlswith reqwest in theotapcrate.Configuring the docker build
FEATURESbuild arg to the dockerfile and the cross platform build script so that you can turn on features likeazurewhen buildingRUSTFLAGSfor the docker build so that we can turn some features on or off when compiling which brings me to my next item...Adding some optional debug utils to the image
When running on k8s I was getting very mysterious crashes after a couple of seconds with 0 logs to stdout. Remoting into the container and running the binary gave me the "Illegal instruction (core dumped)" message, which was a start.
After checking the obvious vector extensions like SSE/AVX/AVX512, I was stumped. So, I added gdb to the container and found the exact instruction which is the longest mnemonic I have ever seen (comes from arrow's buffer crate):
Turns out my laptop has support for a specific feature called
gfni, which the nodes with Intel Xeon processors that I was running did not.I think this will not be the last time someone needs to debug a native code/cross-compilation problem, so I added an option to the build called
DEBUGthat will includegdb(and maybe other debug utilities in the future) in the image.