-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Feat/hive support #36821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feat/hive support #36821
Changes from all commits
a4d837b
94909e1
46e4708
5739b08
6e04ea4
7e26bdd
9ecc516
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| import socket | ||
| import sys | ||
|
|
||
| host = 'hive-server' | ||
| port = 10000 | ||
|
|
||
| print(f"Testing connectivity to {host}:{port}...") | ||
|
|
||
| # Test 1: DNS Resolution | ||
| try: | ||
| ip = socket.gethostbyname(host) | ||
| print(f"DNS Resolved: {host} -> {ip}") | ||
| except Exception as e: | ||
| print(f"DNS Failed: {e}") | ||
| sys.exit(1) | ||
|
|
||
| # Test 2: TCP Socket | ||
| try: | ||
| s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| s.settimeout(5) | ||
| s.connect((ip, port)) | ||
| print(f"TCP Connection Successful to {ip}:{port}") | ||
| s.close() | ||
| except Exception as e: | ||
| print(f"TCP Connection Failed: {e}") | ||
| # sys.exit(1) # Continue to try PyHive if TCP fails? No, PyHive will definitely fail. | ||
|
|
||
| # Test 3: PyHive Connection (Minimal) | ||
| print("Testing PyHive Connection...") | ||
| try: | ||
| from pyhive import hive | ||
| conn = hive.Connection(host=host, port=port, username='hive', auth='NOSASL') # Try NOSASL first | ||
| print("PyHive Connection (NOSASL): SUCCESS") | ||
| conn.close() | ||
| except Exception as e: | ||
| print(f"PyHive Connection (NOSASL) Failed: {e}") | ||
|
|
||
| try: | ||
| from pyhive import hive | ||
| conn = hive.Connection(host=host, port=port, username='hive') # Default auth | ||
| print("PyHive Connection (Default): SUCCESS") | ||
| conn.close() | ||
| except Exception as e: | ||
| print(f"PyHive Connection (Default) Failed: {e}") | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,16 @@ | ||||||||||||||
| FROM apache/superset:latest | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Using the unpinned image tag Severity Level: Critical 🚨
Suggested change
Why it matters? ⭐Using :latest makes image builds non-reproducible and can silently pull breaking or insecure updates. Exposing a build ARG to allow pinning the Superset base image is a minimal, low-risk improvement that helps reproducible builds. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docker/Dockerfile.hive
**Line:** 1:1
**Comment:**
*Possible Bug: Using the unpinned image tag `:latest` makes builds non-reproducible and can introduce breaking or insecure changes; expose a build argument so callers can pin the Superset base image/version.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||||||||||
|
|
||||||||||||||
| USER root | ||||||||||||||
|
|
||||||||||||||
| # Install system dependencies for thrift-sasl | ||||||||||||||
| RUN apt-get update -q \ | ||||||||||||||
| && apt-get install -yq --no-install-recommends \ | ||||||||||||||
| libsasl2-dev \ | ||||||||||||||
| build-essential \ | ||||||||||||||
| libpq-dev \ | ||||||||||||||
| && rm -rf /var/lib/apt/lists/* | ||||||||||||||
|
|
||||||||||||||
| # Install python dependencies inside the virtual environment using uv | ||||||||||||||
| RUN uv pip install --system --python /app/.venv pyhive thrift thrift-sasl psycopg2-binary | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: The command Severity Level: Critical 🚨
Suggested change
Why it matters? ⭐The RUN line in the PR uses "uv pip ..." which looks wrong for the apache/superset image — "uv" is not a standard wrapper and will likely make the build fail. The proposed fallback to use the venv pip if present or python3 -m pip is sensible and the addition of --no-cache-dir avoids leaving pip cache in the layer. This directly fixes a build-time break. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docker/Dockerfile.hive
**Line:** 14:14
**Comment:**
*Possible Bug: The command `uv pip ...` is invalid/unknown in the base image and will cause the RUN step to fail; also installing into a virtualenv path that may not exist will break the build. Use the venv's pip if present, otherwise fall back to the system pip, and add --no-cache-dir to avoid leaving pip cache in the layer.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Incorrect UV Install
The --system flag installs packages globally, not in the venv. Remove it to match main Dockerfile patterns. Code Review Run #c576fd Should Bito avoid suggestions like this for future reviews? (Manage Rules)
|
||||||||||||||
|
|
||||||||||||||
| USER superset | ||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -42,7 +42,7 @@ if [ "$CYPRESS_CONFIG" == "true" ]; then | |||||
| PORT=8081 | ||||||
| fi | ||||||
| # Skip postgres requirements installation for workers to avoid conflicts | ||||||
| if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ]; then | ||||||
| if [[ "$DATABASE_DIALECT" == postgres* ]] && [ "$(whoami)" = "root" ] && [ "$1" != "worker" ] && [ "$1" != "beat" ] && [ "$SUPERSET_ENV" != "production" ]; then | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Using Severity Level: Critical 🚨
Suggested change
Why it matters? ⭐Replacing whoami with id -u -eq 0 is more robust in minimal containers and POSIX-y. It's a sensible hardening to avoid runtime failure when whoami isn't present. Be mindful to use the same approach consistently across the script. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docker/docker-bootstrap.sh
**Line:** 45:45
**Comment:**
*Possible Bug: Using `whoami` in a command substitution can fail on minimal images (or be unavailable); replace it with a numeric UID check (`id -u`) which is more robust and avoids depending on `whoami`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||
| # older images may not have the postgres dev requirements installed | ||||||
| echo "Installing postgres requirements" | ||||||
| if command -v uv > /dev/null 2>&1; then | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -16,6 +16,8 @@ | |||||||||||||
| # limitations under the License. | ||||||||||||||
| # | ||||||||||||||
| set -e | ||||||||||||||
| exec > >(tee -a /app/docker/debug.log) 2>&1 | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable. [possible bug] Severity Level: Critical 🚨
Suggested change
Why it matters? ⭐The improved code safely checks that the directory/file exist and are writable and falls back to console logging if not — that's a practical robustness improvement. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** docker/docker-init.sh
**Line:** 19:19
**Comment:**
*Possible Bug: Robustness issue: if /app/docker doesn't exist or isn't writable, the process-substitution tee can fail or exit and break logging/output; check that the directory/file are creatable and writable and fall back to console-only logging when not writable.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||||||||||
|
|
||||||||||||||
|
|
||||||||||||||
| # | ||||||||||||||
| # Always install local overrides first | ||||||||||||||
|
|
||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Resource leak: the TCP socket
sis only closed on the success path; ifs.connect()raises an exception the socket remains open. Use a context manager or ensures.close()runs in a finally block so the socket is always closed. [resource leak]Severity Level: Minor⚠️
Why it matters? ⭐
This is a valid resource-leak bug: if connect() raises, the socket object may be left open. The improved code uses socket.create_connection within a context manager which ensures the socket is closed on all paths and simplifies timeout handling. It's a straightforward correctness fix.
Prompt for AI Agent 🤖