Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions debug_hive.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import socket
import sys

host = 'hive-server'
port = 10000

print(f"Testing connectivity to {host}:{port}...")

# Test 1: DNS Resolution
try:
ip = socket.gethostbyname(host)
print(f"DNS Resolved: {host} -> {ip}")
except Exception as e:
print(f"DNS Failed: {e}")
sys.exit(1)

# Test 2: TCP Socket
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
s.connect((ip, port))
print(f"TCP Connection Successful to {ip}:{port}")
s.close()
Comment on lines +19 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Resource leak: the TCP socket s is only closed on the success path; if s.connect() raises an exception the socket remains open. Use a context manager or ensure s.close() runs in a finally block so the socket is always closed. [resource leak]

Severity Level: Minor ⚠️

Suggested change
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(5)
s.connect((ip, port))
print(f"TCP Connection Successful to {ip}:{port}")
s.close()
# create_connection handles DNS/IPv4/IPv6 resolution and returns a socket that supports context manager
with socket.create_connection((ip, port), timeout=5) as s:
print(f"TCP Connection Successful to {ip}:{port}")
Why it matters? ⭐

This is a valid resource-leak bug: if connect() raises, the socket object may be left open. The improved code uses socket.create_connection within a context manager which ensures the socket is closed on all paths and simplifies timeout handling. It's a straightforward correctness fix.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** debug_hive.py
**Line:** 19:23
**Comment:**
	*Resource Leak: Resource leak: the TCP socket `s` is only closed on the success path; if `s.connect()` raises an exception the socket remains open. Use a context manager or ensure `s.close()` runs in a finally block so the socket is always closed.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

except Exception as e:
print(f"TCP Connection Failed: {e}")
# sys.exit(1) # Continue to try PyHive if TCP fails? No, PyHive will definitely fail.

# Test 3: PyHive Connection (Minimal)
print("Testing PyHive Connection...")
try:
from pyhive import hive
conn = hive.Connection(host=host, port=port, username='hive', auth='NOSASL') # Try NOSASL first
print("PyHive Connection (NOSASL): SUCCESS")
conn.close()
except Exception as e:
print(f"PyHive Connection (NOSASL) Failed: {e}")

try:
from pyhive import hive
conn = hive.Connection(host=host, port=port, username='hive') # Default auth
print("PyHive Connection (Default): SUCCESS")
conn.close()
except Exception as e:
print(f"PyHive Connection (Default) Failed: {e}")
70 changes: 47 additions & 23 deletions docker-compose-non-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,10 @@
# - Set SUPERSET_LOG_LEVEL=debug in docker/.env-local for detailed Superset logs
# -----------------------------------------------------------------------
x-superset-volumes:
&superset-volumes # /app/pythonpath_docker will be appended to the PYTHONPATH in the final container
&superset-volumes
- ./docker:/app/docker
- superset_home:/app/superset_home

x-common-build: &common-build
context: .
target: dev
cache_from:
- apache/superset-cache:3.10-slim-trixie

services:
redis:
Expand All @@ -42,28 +37,37 @@ services:
restart: unless-stopped
volumes:
- redis:/data
networks:
- databricks-net

db:
env_file:
- path: docker/.env # default
- path: docker/.env
required: true
- path: docker/.env-local # optional override
- path: docker/.env-local
required: false
image: postgres:16
container_name: superset_db
restart: unless-stopped
volumes:
- db_home:/var/lib/postgresql/data
- ./docker/docker-entrypoint-initdb.d:/docker-entrypoint-initdb.d
networks:
- databricks-net

superset:
env_file:
- path: docker/.env # default
- path: docker/.env
required: true
- path: docker/.env-local # optional override
- path: docker/.env-local
required: false
environment:
SUPERSET_ENV: production
DEV_MODE: "false"
build:
<<: *common-build
context: ./docker
dockerfile: Dockerfile.hive
image: superset-hive:latest
container_name: superset_app
command: ["/app/docker/docker-bootstrap.sh", "app-gunicorn"]
user: "root"
Expand All @@ -74,43 +78,54 @@ services:
superset-init:
condition: service_completed_successfully
volumes: *superset-volumes
networks:
- databricks-net

superset-init:
container_name: superset_init
build:
<<: *common-build
image: superset-hive:latest

command: ["/app/docker/docker-init.sh"]
env_file:
- path: docker/.env # default
- path: docker/.env
required: true
- path: docker/.env-local # optional override
- path: docker/.env-local
required: false
environment:
SUPERSET_ENV: production
DEV_MODE: "false"
depends_on:
db:
condition: service_started
redis:
condition: service_started
user: "root"
volumes: *superset-volumes
networks:
- databricks-net
healthcheck:
disable: true

superset-worker:
build:
<<: *common-build
image: superset-hive:latest
container_name: superset_worker
command: ["/app/docker/docker-bootstrap.sh", "worker"]
env_file:
- path: docker/.env # default
- path: docker/.env
required: true
- path: docker/.env-local # optional override
- path: docker/.env-local
required: false
environment:
SUPERSET_ENV: production
DEV_MODE: "false"
restart: unless-stopped
depends_on:
superset-init:
condition: service_completed_successfully
user: "root"
volumes: *superset-volumes
networks:
- databricks-net
healthcheck:
test:
[
Expand All @@ -119,21 +134,25 @@ services:
]

superset-worker-beat:
build:
<<: *common-build
image: superset-hive:latest
container_name: superset_worker_beat
command: ["/app/docker/docker-bootstrap.sh", "beat"]
env_file:
- path: docker/.env # default
- path: docker/.env
required: true
- path: docker/.env-local # optional override
- path: docker/.env-local
required: false
environment:
SUPERSET_ENV: production
DEV_MODE: "false"
restart: unless-stopped
depends_on:
superset-init:
condition: service_completed_successfully
user: "root"
volumes: *superset-volumes
networks:
- databricks-net
healthcheck:
disable: true

Expand All @@ -144,3 +163,8 @@ volumes:
external: false
redis:
external: false

networks:
databricks-net:
name: databricks-net
external: true
16 changes: 16 additions & 0 deletions docker/Dockerfile.hive
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM apache/superset:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using the unpinned image tag :latest makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version. [possible bug]

Severity Level: Critical 🚨

Suggested change
FROM apache/superset:latest
ARG SUPERSET_VERSION=latest
FROM apache/superset:${SUPERSET_VERSION}
Why it matters? ⭐

Using :latest makes image builds non-reproducible and can silently pull breaking or insecure updates. Exposing a build ARG to allow pinning the Superset base image is a minimal, low-risk improvement that helps reproducible builds.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/Dockerfile.hive
**Line:** 1:1
**Comment:**
	*Possible Bug: Using the unpinned image tag `:latest` makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.


USER root

# Install system dependencies for thrift-sasl
RUN apt-get update -q \
&& apt-get install -yq --no-install-recommends \
libsasl2-dev \
build-essential \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*

# Install python dependencies inside the virtual environment using uv
RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The command uv pip ... is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer. [possible bug]

Severity Level: Critical 🚨

Suggested change
RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary
RUN if [ -x /app/.venv/bin/pip ]; then \
/app/.venv/bin/pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \
else \
python3 -m pip install --no-cache-dir pyhive thrift thrift-sasl psycopg2-binary; \
fi
Why it matters? ⭐

The RUN line in the PR uses "uv pip ..." which looks wrong for the apache/superset image — "uv" is not a standard wrapper and will likely make the build fail. The proposed fallback to use the venv pip if present or python3 -m pip is sensible and the addition of --no-cache-dir avoids leaving pip cache in the layer. This directly fixes a build-time break.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/Dockerfile.hive
**Line:** 14:14
**Comment:**
	*Possible Bug: The command `uv pip ...` is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect UV Install

The --system flag installs packages globally, not in the venv. Remove it to match main Dockerfile patterns.

Code Review Run #c576fd


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them


USER superset
2 changes: 1 addition & 1 deletion docker/docker-bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ if [ "$CYPRESS_CONFIG" == "true" ]; then
PORT=8081
fi
# Skip postgres requirements installation for workers to avoid conflicts
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ]; then
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using whoami in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (id -u) which is more robust and avoids depending on whoami. [possible bug]

Severity Level: Critical 🚨

Suggested change
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(id -u)" -eq 0 ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then
Why it matters? ⭐

Replacing whoami with id -u -eq 0 is more robust in minimal containers and POSIX-y. It's a sensible hardening to avoid runtime failure when whoami isn't present. Be mindful to use the same approach consistently across the script.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/docker-bootstrap.sh
**Line:** 45:45
**Comment:**
	*Possible Bug: Using `whoami` in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (`id -u`) which is more robust and avoids depending on `whoami`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

# older images may not have the postgres dev requirements installed
echo "Installing postgres requirements"
if command -v uv > /dev/null 2>&1; then
Expand Down
2 changes: 2 additions & 0 deletions docker/docker-init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
# limitations under the License.
#
set -e
exec > >(tee -a /app/docker/debug.log) 2>&1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable. [possible bug]

Severity Level: Critical 🚨

Suggested change
exec > >(tee -a /app/docker/debug.log) 2>&1
if mkdir -p /app/docker && touch /app/docker/debug.log && [ -w /app/docker/debug.log ]; then
exec > >(tee -a /app/docker/debug.log) 2>&1
else
echo "Warning: cannot write to /app/docker/debug.log, continuing with stdout/stderr only"
fi
Why it matters? ⭐

The improved code safely checks that the directory/file exist and are writable and falls back to console logging if not — that's a practical robustness improvement.
Note: in bash with set -e, commands inside the if condition won't abort the script, so this construct is safe; you may still want to combine this with permission setting or clear logging policy for mounted volumes.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** docker/docker-init.sh
**Line:** 19:19
**Comment:**
	*Possible Bug: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.



#
# Always install local overrides first
Expand Down
Loading