Feat/hive support #36821

sp-202 · 2025-12-24T06:12:38Z

User description

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

CodeAnt-AI Description

Add Hive-ready Superset image and production non-dev compose with connectivity helpers

What Changed

Introduces a Hive-enabled Superset image and Dockerfile so the app container includes Hive client libraries and Python drivers for connecting to Hive
Updates the non-dev production compose to run the prebuilt Hive image for app, init, and workers, attach services to an external "databricks-net" network, and set production environment variables so containers start with Hive support without local rebuilds
Writes container init output to /app/docker/debug.log and adds a small debug script to test DNS/TCP/PyHive connectivity to a Hive server for easier troubleshooting
Skips installing Postgres dev requirements during bootstrap when SUPERSET_ENV=production to avoid extra install time in production-mode containers

Impact

✅ Can connect to Hive from composed environment
✅ Shorter startup when using the prebuilt Hive image (no local rebuild)
✅ Clearer init logs and connectivity diagnostics for Hive setups

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

…ive support

codeant-ai-for-open-source · 2025-12-24T06:12:43Z

CodeAnt AI is reviewing your PR.

Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

codeant-ai-for-open-source · 2025-12-24T06:15:58Z

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review Sensitive logs The new exec redirection routes all stdout/stderr into /app/docker/debug.log. That will capture any secrets or sensitive environment values printed later (for example DB URIs, passwords, or exported test URIs), so reviewers should validate what information may be emitted and whether sensitive values are filtered or masked before being logged. Log file management Appending all container output to a single file inside the container can cause uncontrolled file growth, lack of rotation, and permission issues. Confirm that the target directory exists, appropriate file permissions are set, and a rotation/retention strategy is in place to avoid disk exhaustion and information exposure. Reproducibility / Pinning The base image is referenced as `apache/superset:latest` and the pip packages are installed without pinned versions. Unpinned images and packages make builds non-reproducible and increase the chance of unexpected breakage or supply-chain vulnerabilities. Consider pinning the base image and the Python packages to explicit versions or using a lockfile. Image size / build deps left behind The Dockerfile installs build-time packages (e.g. `build-essential`, `libsasl2-dev`, `libpq-dev`) but does not remove them after building Python wheels. This leaves unnecessary toolchains in the final image, increasing image size and attack surface. Use a single RUN layer that installs deps, installs Python packages, then uninstalls build-only deps (or use multi-stage build). Unclear pip / venv invocation The line uses `uv pip install --system --python /app/.venv ...`. It's not obvious whether `uv` is present in the base image and whether this reliably installs into the intended virtualenv. Prefer invoking the venv's pip directly (`/app/.venv/bin/pip`) or using a known wrapper to avoid hidden dependencies/failures.

codeant-ai-for-open-source · 2025-12-24T06:17:20Z

debug_hive.py

+    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    s.settimeout(5)
+    s.connect((ip, port))
+    print(f"TCP Connection Successful to {ip}:{port}")
+    s.close()


Suggestion: Resource leak: the TCP socket s is only closed on the success path; if s.connect() raises an exception the socket remains open. Use a context manager or ensure s.close() runs in a finally block so the socket is always closed. [resource leak]

Severity Level: Minor ⚠️

Suggested change

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

s.settimeout(5)

s.connect((ip, port))

print(f"TCP Connection Successful to {ip}:{port}")

s.close()

# create_connection handles DNS/IPv4/IPv6 resolution and returns a socket that supports context manager

with socket.create_connection((ip, port), timeout=5) as s:

print(f"TCP Connection Successful to {ip}:{port}")

Why it matters? ⭐

This is a valid resource-leak bug: if connect() raises, the socket object may be left open. The improved code uses socket.create_connection within a context manager which ensures the socket is closed on all paths and simplifies timeout handling. It's a straightforward correctness fix.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** debug_hive.py **Line:** 19:23 **Comment:** *Resource Leak: Resource leak: the TCP socket `s` is only closed on the success path; if `s.connect()` raises an exception the socket remains open. Use a context manager or ensure `s.close()` runs in a finally block so the socket is always closed. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-24T06:17:20Z

docker/Dockerfile.hive

@@ -0,0 +1,16 @@
+FROM apache/superset:latest


Suggestion: Using the unpinned image tag :latest makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version. [possible bug]

Severity Level: Critical 🚨

Suggested change

FROM apache/superset:latest

ARG SUPERSET_VERSION=latest

FROM apache/superset:${SUPERSET_VERSION}

Why it matters? ⭐

Using :latest makes image builds non-reproducible and can silently pull breaking or insecure updates. Exposing a build ARG to allow pinning the Superset base image is a minimal, low-risk improvement that helps reproducible builds.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** docker/Dockerfile.hive **Line:** 1:1 **Comment:** *Possible Bug: Using the unpinned image tag `:latest` makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-24T06:17:20Z

docker/Dockerfile.hive

+    && rm -rf /var/lib/apt/lists/*
+
+# Install python dependencies inside the virtual environment using uv
+RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary


Suggestion: The command uv pip ... is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer. [possible bug]

Severity Level: Critical 🚨

Suggested change

RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary

RUN if [ -x /app/.venv/bin/pip ]; then \

/app/.venv/bin/pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \

else \

python3 -m pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \

fi

Why it matters? ⭐

The RUN line in the PR uses "uv pip ..." which looks wrong for the apache/superset image — "uv" is not a standard wrapper and will likely make the build fail. The proposed fallback to use the venv pip if present or python3 -m pip is sensible and the addition of --no-cache-dir avoids leaving pip cache in the layer. This directly fixes a build-time break.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** docker/Dockerfile.hive **Line:** 14:14 **Comment:** *Possible Bug: The command `uv pip ...` is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-24T06:17:20Z

docker/docker-bootstrap.sh

 fi
 # Skip postgres requirements installation for workers to avoid conflicts
-if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ]; then
+if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then


Suggestion: Using whoami in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (id -u) which is more robust and avoids depending on whoami. [possible bug]

Severity Level: Critical 🚨

Suggested change

if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then

if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(id -u)" -eq 0 ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then

Why it matters? ⭐

Replacing whoami with id -u -eq 0 is more robust in minimal containers and POSIX-y. It's a sensible hardening to avoid runtime failure when whoami isn't present. Be mindful to use the same approach consistently across the script.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** docker/docker-bootstrap.sh **Line:** 45:45 **Comment:** *Possible Bug: Using `whoami` in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (`id -u`) which is more robust and avoids depending on `whoami`. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-24T06:17:20Z

docker/docker-init.sh

 # limitations under the License.
 #
 set -e
+exec > >(tee -a /app/docker/debug.log) 2>&1


Suggestion: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable. [possible bug]

Severity Level: Critical 🚨

Suggested change

exec > >(tee -a /app/docker/debug.log) 2>&1

if mkdir -p /app/docker && touch /app/docker/debug.log && [ -w /app/docker/debug.log ]; then

exec > >(tee -a /app/docker/debug.log) 2>&1

else

echo "Warning: cannot write to /app/docker/debug.log, continuing with stdout/stderr only"

fi

Why it matters? ⭐

The improved code safely checks that the directory/file exist and are writable and falls back to console logging if not — that's a practical robustness improvement.
Note: in bash with set -e, commands inside the if condition won't abort the script, so this construct is safe; you may still want to combine this with permission setting or clear logging policy for mounted volumes.

Prompt for AI Agent 🤖

This is a comment left during a code review. **Path:** docker/docker-init.sh **Line:** 19:19 **Comment:** *Possible Bug: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

codeant-ai-for-open-source · 2025-12-24T06:17:24Z

CodeAnt AI finished reviewing your PR.

bito-code-review

Code Review Agent Run #c576fd

Actionable Suggestions - 1

docker/Dockerfile.hive - 1
- Incorrect UV Install · Line 14-14

Additional Suggestions - 6

docker/Dockerfile.hive - 2
- Unpinned Base Image · Line 1-1
  
  Using :latest can cause build inconsistencies. Pin to the current latest version 5.0.0 for reproducibility.
- Missing License Header · Line 1-1
  
  This new file lacks the required ASF license header. Add the standard 16-line Apache header before the FROM instruction.
debug_hive.py - 4
- Inconsistent failure handling · Line 25-26
  
  The script continues testing PyHive after TCP connection fails, but the inline comment suggests it should exit since PyHive depends on TCP. This could lead to unnecessary failed attempts in debug output.
- Code duplication · Line 30-44
  
  The code for testing PyHive connections is duplicated; consider extracting into a helper function for better maintainability.
- Missing license header · Line 1-1
  
  New files in the Apache Superset repository must include the ASF license header for compliance.
- Missing shebang · Line 1-1
  
  Python scripts should include a shebang for proper execution, especially if intended to be run directly.

Review Details

Files reviewed - 4 · Commit Range: a4d837b..9ecc516
- debug_hive.py
- docker/Dockerfile.hive
- docker/docker-bootstrap.sh
- docker/docker-init.sh
Files skipped - 1
- docker-compose-non-dev.yml - Reason: Filter setting
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

/review - Manually triggers a full AI review.
/pause - Pauses automatic reviews on this pull request.
/resume - Resumes automatic reviews.
/resolve - Marks all Bito-posted review comments as resolved.
/abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by

bito-code-review · 2025-12-24T06:25:22Z

docker/Dockerfile.hive

+    && rm -rf /var/lib/apt/lists/*
+
+# Install python dependencies inside the virtual environment using uv
+RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary


Incorrect UV Install

The --system flag installs packages globally, not in the venv. Remove it to match main Dockerfile patterns.

Code Review Run #c576fd

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

Ubuntu and others added 7 commits December 10, 2025 06:25

network changed

a4d837b

Merge branch 'apache:master' into master

94909e1

hive support added and all containers are putted in same container

46e4708

python file command changed

5739b08

Merge remote-tracking branch 'origin/master'

6e04ea4

Merge branch 'apache:master' into master

7e26bdd

feat: Add Dockerfile.hive and update compose/scripts for production H…

9ecc516

…ive support

pull-request-size bot added the size/L label Dec 24, 2025

dosubot bot added the data:connect:hive Related to Hive label Dec 24, 2025

codeant-ai-for-open-source bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 24, 2025

codeant-ai-for-open-source bot reviewed Dec 24, 2025

View reviewed changes

bito-code-review bot reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/hive support #36821

Feat/hive support #36821

sp-202 commented Dec 24, 2025 •

edited by codeant-ai-for-open-source bot

Loading

Uh oh!

codeant-ai-for-open-source bot commented Dec 24, 2025

Uh oh!

codeant-ai-for-open-source bot commented Dec 24, 2025

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Uh oh!

codeant-ai-for-open-source bot commented Dec 24, 2025

Uh oh!

bito-code-review bot left a comment •

edited

Loading

Uh oh!

bito-code-review bot Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	FROM apache/superset:latest
	ARG SUPERSET_VERSION=latest
	FROM apache/superset:${SUPERSET_VERSION}

-RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary
+RUN if [ -x /app/.venv/bin/pip ]; then \
+      /app/.venv/bin/pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \
+    else \
+      python3 -m pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \
+    fi

	if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
	if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(id -u)" -eq 0 ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then

Feat/hive support #36821

Are you sure you want to change the base?

Feat/hive support #36821

Conversation

sp-202 commented Dec 24, 2025 • edited by codeant-ai-for-open-source bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai-for-open-source bot commented Dec 24, 2025

Thanks for using CodeAnt! 🎉

Uh oh!

codeant-ai-for-open-source bot commented Dec 24, 2025

Nitpicks 🔍

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

codeant-ai-for-open-source bot commented Dec 24, 2025

Uh oh!

bito-code-review bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Code Review Agent Run #c576fd

Uh oh!

bito-code-review bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sp-202 commented Dec 24, 2025 •

edited by codeant-ai-for-open-source bot

Loading

bito-code-review bot left a comment •

edited

Loading