Remove ZetaSQL and Modernize Build Toolchain#224
Open
czgdp1807 wants to merge 12 commits intogoogle:masterfrom
Open
Remove ZetaSQL and Modernize Build Toolchain#224czgdp1807 wants to merge 12 commits intogoogle:masterfrom
czgdp1807 wants to merge 12 commits intogoogle:masterfrom
Conversation
Collaborator
Author
Build Performance ImprovementBefore vs After
Improvement: 5m 58s faster (74% reduction) – ~3.8x speedup Performance AnalysisThe dramatic build time improvement is almost due to ZetaSQL removal: Primary Factor: ZetaSQL Dependency Chain (~6 minutes saved)
Toolchain ModernizationThe dependency upgrades (Bazel Skylib, rules_foreign_cc, Google Test, SQLite) likely contributed ** to build time improvement**. These updates primarily provide:
ConclusionZetaSQL removal is responsible for virtually all of the 74% build time reduction. The toolchain modernization delivers value through improved maintainability and future compatibility. Machine Informationczgdp1807@qgpu1:~/ml-metadata$ uname -a
Linux qgpu1 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec 5 13:09:44 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
czgdp1807@qgpu1:~/ml-metadata$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Vendor ID: AuthenticAMD
Model name: AMD Ryzen Threadripper 2970WX 24-Core Processor
CPU family: 23
Model: 8
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
Stepping: 2
Frequency boost: enabled
CPU(s) scaling MHz: 74%
CPU max MHz: 3000.0000
CPU min MHz: 2200.0000
BogoMIPS: 5987.95
czgdp1807@qgpu1:~/ml-metadata$ grep MemTotal /proc/meminfo | awk '{print $2/1024/1024 " GB
"}'
125.726 GB |
Collaborator
Author
|
Native build on linux is now possible with |
7424cff to
073d4d9
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Remove ZetaSQL integration and filter_query functionality to eliminate complex dependency chain and simplify build process. The filter_query feature relied on ZetaSQL for SQL query parsing and validation. Changes: - Remove com_google_zetasql from WORKSPACE - Comment out filter_query dependencies in metadata_store BUILD - Remove filter_query implementation from query executors - Remove filter_query tests from metadata_store_test.py - Return UnimplementedError for filter_query usage This prepares the codebase for toolchain modernization by reducing dependency complexity. Users relying on filter_query should migrate to alternative filtering approaches.
Upgrade build system components to latest stable versions for improved compatibility with Ubuntu 24.04 and GCC 13. Bazel Dependencies: - rules_foreign_cc 0.9.0 -> 0.12.0 (removed obsolete patch) - Bazel Skylib 1.5.0 -> 1.7.1 - Abseil 20230802.1 (kept for compatibility) - Google Test 1.12.1 -> 1.15.2 - pybind11 2.10.1 -> 2.13.6 (Python 3.13 support) - SQLite 3.39.2 -> 3.47.2 Build Configuration: - Add --host_cxxopt=\"-std=c++17\" for host tool compilation - Fix GCC 13 compatibility in libmysqlclient.BUILD - Add MariaDB stdint.h type mappings Fixes: - Abseil C++14 requirement error in host tool compilations - GCC 13 implicit function declaration warnings - MariaDB type compatibility with modern compilers
Add automated setup script and environment configuration for building ml-metadata natively on Linux without Docker. - environment.yml: mamba environment with Python 3.11, GCC toolchain - setup_and_build.sh: automated Miniforge/Bazelisk install and build - .bazelrc: GCC compatibility flags (C11/C++17, warning suppressions) Enables building on SSH/bare metal systems without sudo access.
Enable native builds on Apple Silicon by fixing compilation issues: - Update zlib to 1.3.1 and patch upb for C23 compatibility - Configure libmysqlclient and PostgreSQL for ARM64 architecture - Fix Python extension symbol exports for macOS linker - Add conda environment specification for macOS development Resolves platform-specific type conflicts, iconv detection, and x86-specific assembly code issues on darwin_arm64.
The --no-as-needed flag is GNU ld specific and not supported on macOS. Make it Linux-only and add macOS-specific -Wl,-undefined,dynamic_lookup flag for ARM64 builds. Fixes protoc/grpc plugin linking failures.
Change --copt to --conlyopt for C-specific warning flags to fix "unrecognized command line option" errors on Linux. Make macOS iconv configuration conditional via environment variable. Add Linux conda environment specification with manylinux2014 toolchain.
Add conda/micromamba-based GitHub Actions workflows for building and testing ml-metadata wheels. Uses GCC 8.5.0 to match manylinux2014 compatibility, supports Python 3.9-3.11, and includes automated PyPI publishing.
Remove obsolete CI workflows in preparation for updated build system. The removed workflows relied on Docker-based builds that are being replaced with native platform builds.
6238519 to
6e8ecce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR removes the ZetaSQL dependency, modernizes the build toolchain to support Ubuntu 24.04, GCC 13, Python 3.13, and enables native builds on macOS ARM64 (Apple Silicon). The changes improve long-term maintainability, reduce build complexity, and eliminate Docker dependencies while maintaining backward compatibility across Linux and macOS platforms.
ZetaSQL Removal
Removed ZetaSQL integration and the
filter_queryfunctionality to:Changes:
com_google_zetasqlfrom WORKSPACEfilter_queryin query executors (returnsUnimplementedError)metadata_store_test.pyImpact: Users relying on
filter_queryshould migrate to standard list operations with client-side filtering.Toolchain Updates Completed
Bazel Dependencies
Build Configuration (.bazelrc)
darwin_arm64-Wl,-undefined,dynamic_lookupfor proper Python extension symbol resolution-Wno-error=c23-extensionsfor modern Clang compatibilityCMAKE_ICONV_FLAGenvironment variable for libmysqlclient iconv configuration--no-as-needed) applied only on Linux--conlyopt=-std=c11)--cxxopt/-std=c++17,--host_cxxopt="-std=c++17")-Wno-array-parameter,-Wno-implicit-function-declaration) via--conlyopt-fpermissive)-Wno-error)Platform-Specific Fixes
darwin_arm64config for Apple Siliconuint/ushort/ulongtype mappings for GCC 13 / Ubuntu 24.04-liconvlinking for character encoding support via conda libiconv_GNU_SOURCEto fix implicit function declarationsCMAKE_ICONV_FLAGenvironment variableCI/CD - Conda-Based Native Builds
Replaced Docker-based CI with native conda builds for better platform support and maintainability.
.github/workflows/build.ymland.github/workflows/test.yml(Docker-based).github/workflows/conda-build.yml: Native build workflow with micromamba.github/workflows/conda-test.yml: Native test workflowci/environment-macos.yml: macOS-specific conda environment (libiconv, delocate)ci/environment.yml: Linux-specific conda environment (GCC 8.x, auditwheel, manylinux2014 sysroot)ubuntu-latest,macos-latest(x86_64 and ARM64 via GitHub-hosted runners)auditwheel(Linux manylinux2014),delocate(macOS)Updates Intentionally Avoided
Bazel 6.5.0 (7.4.1 attempted)
Reason: gRPC 1.46.3 incompatible with Bazel 7.x
Error:
'apple_common.multi_arch_split' value has no field or method 'platform_type'Blocker: Requires gRPC upgrade to 1.50+ which has breaking API changes
Abseil 20230802.1 (20240722.0 attempted)
Reason: API changes break existing code
Error:
'StrCat' is not a member of 'absl'Blocker: Requires comprehensive codebase refactoring for new Abseil API
Protobuf 3.21.9 (3.25.5 attempted)
Reason: Bazel rules removed in newer versions
Error:
file '@com_google_protobuf//:protobuf.bzl' does not contain symbol 'cc_proto_library'Blocker: Requires migration to new Protobuf Bazel rules (protobuf-22.x or 4.x)
PostgreSQL 12.1 (16.6 and 17.2 attempted)
Reason: File structure completely changed between major versions
Error:
missing input file '@postgresql//:src/port/thread.c'(and others)Blocker: Requires complete rewrite of
postgresql.BUILDfor new file structureNext Steps
Future Improvements
postgresql.BUILDfor modern PostgreSQL versionsmacos-14when available for native ARM64 CI buildsTesting Checklist
Compatibility
filter_queryfeature disabledTested on:
Build System: Bazel 6.1.0 via Bazelisk 1.20.0