Skip to content

Conversation

@sp-202
Copy link

@sp-202 sp-202 commented Dec 24, 2025

User description

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

CodeAnt-AI Description

Add Hive-ready Superset image and production non-dev compose with connectivity helpers

What Changed

  • Introduces a Hive-enabled Superset image and Dockerfile so the app container includes Hive client libraries and Python drivers for connecting to Hive
  • Updates the non-dev production compose to run the prebuilt Hive image for app, init, and workers, attach services to an external "databricks-net" network, and set production environment variables so containers start with Hive support without local rebuilds
  • Writes container init output to /app/docker/debug.log and adds a small debug script to test DNS/TCP/PyHive connectivity to a Hive server for easier troubleshooting
  • Skips installing Postgres dev requirements during bootstrap when SUPERSET_ENV=production to avoid extra install time in production-mode containers

Impact

✅ Can connect to Hive from composed environment
✅ Shorter startup when using the prebuilt Hive image (no local rebuild)
✅ Clearer init logs and connectivity diagnostics for Hive setups

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@dosubot dosubot bot added the data:connect:hive Related to Hive label Dec 24, 2025
@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:L This PR changes 100-499 lines, ignoring generated files label Dec 24, 2025
@codeant-ai-for-open-source
Copy link
Contributor

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Sensitive logs
    The new exec redirection routes all stdout/stderr into /app/docker/debug.log. That will capture any secrets or sensitive environment values printed later (for example DB URIs, passwords, or exported test URIs), so reviewers should validate what information may be emitted and whether sensitive values are filtered or masked before being logged.

  • Log file management
    Appending all container output to a single file inside the container can cause uncontrolled file growth, lack of rotation, and permission issues. Confirm that the target directory exists, appropriate file permissions are set, and a rotation/retention strategy is in place to avoid disk exhaustion and information exposure.

  • Reproducibility / Pinning
    The base image is referenced as apache/superset:latest and the pip packages are installed without pinned versions.
    Unpinned images and packages make builds non-reproducible and increase the chance of unexpected breakage or supply-chain vulnerabilities.
    Consider pinning the base image and the Python packages to explicit versions or using a lockfile.

  • Image size / build deps left behind
    The Dockerfile installs build-time packages (e.g. build-essential, libsasl2-dev, libpq-dev) but does not remove them after building Python wheels.
    This leaves unnecessary toolchains in the final image, increasing image size and attack surface. Use a single RUN layer that installs deps, installs Python packages, then uninstalls build-only deps (or use multi-stage build).

  • Unclear pip / venv invocation
    The line uses uv pip install --system --python /app/.venv .... It's not obvious whether uv is present in the base image and whether this reliably installs into the intended virtualenv.
    Prefer invoking the venv's pip directly (/app/.venv/bin/pip) or using a known wrapper to avoid hidden dependencies/failures.

Comment on lines +19 to +23
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
s.connect((ip, port))
print(f"TCP Connection Successful to {ip}:{port}")
s.close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Resource leak: the TCP socket s is only closed on the success path; if s.connect() raises an exception the socket remains open. Use a context manager or ensure s.close() runs in a finally block so the socket is always closed. [resource leak]

Severity Level: Minor ⚠️

Suggested change
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
s.connect((ip, port))
print(f"TCP Connection Successful to {ip}:{port}")
s.close()
# create_connection handles DNS/IPv4/IPv6 resolution and returns a socket that supports context manager
with socket.create_connection((ip, port), timeout=5) as s:
print(f"TCP Connection Successful to {ip}:{port}")
Why it matters? ⭐

This is a valid resource-leak bug: if connect() raises, the socket object may be left open. The improved code uses socket.create_connection within a context manager which ensures the socket is closed on all paths and simplifies timeout handling. It's a straightforward correctness fix.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** debug_hive.py
**Line:** 19:23
**Comment:**
	*Resource Leak: Resource leak: the TCP socket `s` is only closed on the success path; if `s.connect()` raises an exception the socket remains open. Use a context manager or ensure `s.close()` runs in a finally block so the socket is always closed.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@@ -0,0 +1,16 @@
FROM apache/superset:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using the unpinned image tag :latest makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version. [possible bug]

Severity Level: Critical 🚨

Suggested change
FROM apache/superset:latest
ARG SUPERSET_VERSION=latest
FROM apache/superset:${SUPERSET_VERSION}
Why it matters? ⭐

Using :latest makes image builds non-reproducible and can silently pull breaking or insecure updates. Exposing a build ARG to allow pinning the Superset base image is a minimal, low-risk improvement that helps reproducible builds.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/Dockerfile.hive
**Line:** 1:1
**Comment:**
	*Possible Bug: Using the unpinned image tag `:latest` makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

&& rm -rf /var/lib/apt/lists/*

# Install python dependencies inside the virtual environment using uv
RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The command uv pip ... is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer. [possible bug]

Severity Level: Critical 🚨

Suggested change
RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary
RUN if [ -x /app/.venv/bin/pip ]; then \
/app/.venv/bin/pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \
else \
python3 -m pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \
fi
Why it matters? ⭐

The RUN line in the PR uses "uv pip ..." which looks wrong for the apache/superset image — "uv" is not a standard wrapper and will likely make the build fail. The proposed fallback to use the venv pip if present or python3 -m pip is sensible and the addition of --no-cache-dir avoids leaving pip cache in the layer. This directly fixes a build-time break.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/Dockerfile.hive
**Line:** 14:14
**Comment:**
	*Possible Bug: The command `uv pip ...` is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

fi
# Skip postgres requirements installation for workers to avoid conflicts
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ]; then
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using whoami in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (id -u) which is more robust and avoids depending on whoami. [possible bug]

Severity Level: Critical 🚨

Suggested change
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(id -u)" -eq 0 ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
Why it matters? ⭐

Replacing whoami with id -u -eq 0 is more robust in minimal containers and POSIX-y. It's a sensible hardening to avoid runtime failure when whoami isn't present. Be mindful to use the same approach consistently across the script.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/docker-bootstrap.sh
**Line:** 45:45
**Comment:**
	*Possible Bug: Using `whoami` in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (`id -u`) which is more robust and avoids depending on `whoami`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

# limitations under the License.
#
set -e
exec > >(tee -a /app/docker/debug.log) 2>&1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable. [possible bug]

Severity Level: Critical 🚨

Suggested change
exec > >(tee -a /app/docker/debug.log) 2>&1
if mkdir -p /app/docker && touch /app/docker/debug.log && [ -w /app/docker/debug.log ]; then
exec > >(tee -a /app/docker/debug.log) 2>&1
else
echo "Warning: cannot write to /app/docker/debug.log, continuing with stdout/stderr only"
fi
Why it matters? ⭐

The improved code safely checks that the directory/file exist and are writable and falls back to console logging if not — that's a practical robustness improvement.
Note: in bash with set -e, commands inside the if condition won't abort the script, so this construct is safe; you may still want to combine this with permission setting or clear logging policy for mounted volumes.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/docker-init.sh
**Line:** 19:19
**Comment:**
	*Possible Bug: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@codeant-ai-for-open-source
Copy link
Contributor

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #c576fd

Actionable Suggestions - 1
  • docker/Dockerfile.hive - 1
Additional Suggestions - 6
  • docker/Dockerfile.hive - 2
    • Unpinned Base Image · Line 1-1
      Using :latest can cause build inconsistencies. Pin to the current latest version 5.0.0 for reproducibility.
    • Missing License Header · Line 1-1
      This new file lacks the required ASF license header. Add the standard 16-line Apache header before the FROM instruction.
  • debug_hive.py - 4
    • Inconsistent failure handling · Line 25-26
      The script continues testing PyHive after TCP connection fails, but the inline comment suggests it should exit since PyHive depends on TCP. This could lead to unnecessary failed attempts in debug output.
    • Code duplication · Line 30-44
      The code for testing PyHive connections is duplicated; consider extracting into a helper function for better maintainability.
    • Missing license header · Line 1-1
      New files in the Apache Superset repository must include the ASF license header for compliance.
    • Missing shebang · Line 1-1
      Python scripts should include a shebang for proper execution, especially if intended to be run directly.
Review Details
  • Files reviewed - 4 · Commit Range: a4d837b..9ecc516
    • debug_hive.py
    • docker/Dockerfile.hive
    • docker/docker-bootstrap.sh
    • docker/docker-init.sh
  • Files skipped - 1
    • docker-compose-non-dev.yml - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at [email protected].

Documentation & Help

AI Code Review powered by Bito Logo

&& rm -rf /var/lib/apt/lists/*

# Install python dependencies inside the virtual environment using uv
RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect UV Install

The --system flag installs packages globally, not in the venv. Remove it to match main Dockerfile patterns.

Code Review Run #c576fd


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:connect:hive Related to Hive size/L size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant