-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Feat/hive support #36821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feat/hive support #36821
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
Nitpicks 🔍
|
| s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| s.settimeout(5) | ||
| s.connect((ip, port)) | ||
| print(f"TCP Connection Successful to {ip}:{port}") | ||
| s.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Resource leak: the TCP socket s is only closed on the success path; if s.connect() raises an exception the socket remains open. Use a context manager or ensure s.close() runs in a finally block so the socket is always closed. [resource leak]
Severity Level: Minor
| s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | |
| s.settimeout(5) | |
| s.connect((ip, port)) | |
| print(f"TCP Connection Successful to {ip}:{port}") | |
| s.close() | |
| # create_connection handles DNS/IPv4/IPv6 resolution and returns a socket that supports context manager | |
| with socket.create_connection((ip, port), timeout=5) as s: | |
| print(f"TCP Connection Successful to {ip}:{port}") |
Why it matters? ⭐
This is a valid resource-leak bug: if connect() raises, the socket object may be left open. The improved code uses socket.create_connection within a context manager which ensures the socket is closed on all paths and simplifies timeout handling. It's a straightforward correctness fix.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** debug_hive.py
**Line:** 19:23
**Comment:**
*Resource Leak: Resource leak: the TCP socket `s` is only closed on the success path; if `s.connect()` raises an exception the socket remains open. Use a context manager or ensure `s.close()` runs in a finally block so the socket is always closed.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| @@ -0,0 +1,16 @@ | |||
| FROM apache/superset:latest | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Using the unpinned image tag :latest makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version. [possible bug]
Severity Level: Critical 🚨
| FROM apache/superset:latest | |
| ARG SUPERSET_VERSION=latest | |
| FROM apache/superset:${SUPERSET_VERSION} |
Why it matters? ⭐
Using :latest makes image builds non-reproducible and can silently pull breaking or insecure updates. Exposing a build ARG to allow pinning the Superset base image is a minimal, low-risk improvement that helps reproducible builds.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** docker/Dockerfile.hive
**Line:** 1:1
**Comment:**
*Possible Bug: Using the unpinned image tag `:latest` makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install python dependencies inside the virtual environment using uv | ||
| RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: The command uv pip ... is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer. [possible bug]
Severity Level: Critical 🚨
| RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary | |
| RUN if [ -x /app/.venv/bin/pip ]; then \ | |
| /app/.venv/bin/pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \ | |
| else \ | |
| python3 -m pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \ | |
| fi |
Why it matters? ⭐
The RUN line in the PR uses "uv pip ..." which looks wrong for the apache/superset image — "uv" is not a standard wrapper and will likely make the build fail. The proposed fallback to use the venv pip if present or python3 -m pip is sensible and the addition of --no-cache-dir avoids leaving pip cache in the layer. This directly fixes a build-time break.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** docker/Dockerfile.hive
**Line:** 14:14
**Comment:**
*Possible Bug: The command `uv pip ...` is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| fi | ||
| # Skip postgres requirements installation for workers to avoid conflicts | ||
| if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ]; then | ||
| if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Using whoami in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (id -u) which is more robust and avoids depending on whoami. [possible bug]
Severity Level: Critical 🚨
| if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then | |
| if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(id -u)" -eq 0 ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then |
Why it matters? ⭐
Replacing whoami with id -u -eq 0 is more robust in minimal containers and POSIX-y. It's a sensible hardening to avoid runtime failure when whoami isn't present. Be mindful to use the same approach consistently across the script.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** docker/docker-bootstrap.sh
**Line:** 45:45
**Comment:**
*Possible Bug: Using `whoami` in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (`id -u`) which is more robust and avoids depending on `whoami`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # limitations under the License. | ||
| # | ||
| set -e | ||
| exec > >(tee -a /app/docker/debug.log) 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable. [possible bug]
Severity Level: Critical 🚨
| exec > >(tee -a /app/docker/debug.log) 2>&1 | |
| if mkdir -p /app/docker && touch /app/docker/debug.log && [ -w /app/docker/debug.log ]; then | |
| exec > >(tee -a /app/docker/debug.log) 2>&1 | |
| else | |
| echo "Warning: cannot write to /app/docker/debug.log, continuing with stdout/stderr only" | |
| fi |
Why it matters? ⭐
The improved code safely checks that the directory/file exist and are writable and falls back to console logging if not — that's a practical robustness improvement.
Note: in bash with set -e, commands inside the if condition won't abort the script, so this construct is safe; you may still want to combine this with permission setting or clear logging policy for mounted volumes.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** docker/docker-init.sh
**Line:** 19:19
**Comment:**
*Possible Bug: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review Agent Run #c576fd
Actionable Suggestions - 1
-
docker/Dockerfile.hive - 1
- Incorrect UV Install · Line 14-14
Additional Suggestions - 6
-
docker/Dockerfile.hive - 2
-
debug_hive.py - 4
-
Inconsistent failure handling · Line 25-26The script continues testing PyHive after TCP connection fails, but the inline comment suggests it should exit since PyHive depends on TCP. This could lead to unnecessary failed attempts in debug output.
-
Code duplication · Line 30-44The code for testing PyHive connections is duplicated; consider extracting into a helper function for better maintainability.
-
Missing license header · Line 1-1New files in the Apache Superset repository must include the ASF license header for compliance.
-
Missing shebang · Line 1-1Python scripts should include a shebang for proper execution, especially if intended to be run directly.
-
Review Details
-
Files reviewed - 4 · Commit Range:
a4d837b..9ecc516- debug_hive.py
- docker/Dockerfile.hive
- docker/docker-bootstrap.sh
- docker/docker-init.sh
-
Files skipped - 1
- docker-compose-non-dev.yml - Reason: Filter setting
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at [email protected].
Documentation & Help
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Install python dependencies inside the virtual environment using uv | ||
| RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --system flag installs packages globally, not in the venv. Remove it to match main Dockerfile patterns.
Code Review Run #c576fd
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
User description
SUMMARY
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION
CodeAnt-AI Description
Add Hive-ready Superset image and production non-dev compose with connectivity helpers
What Changed
Impact
✅ Can connect to Hive from composed environment✅ Shorter startup when using the prebuilt Hive image (no local rebuild)✅ Clearer init logs and connectivity diagnostics for Hive setups💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.