Add openGauss Support for CodeGen #2410

sunshuang1866 · 2026-01-28T10:11:28Z

This pull request introduces a comprehensive setup for deploying the CodeGen application with openGauss as the vector database on Intel Xeon CPUs. It adds new deployment documentation, a Docker Compose configuration for orchestrating all required services, an environment variable setup script, and an automated test script to validate the deployment. These changes enable users to easily build, configure, deploy, and test the full GenAIComps pipeline—including embedding, retrieval, LLM, and UI services—on Intel Xeon platforms with openGauss integration.

Deployment and Configuration:

Added a detailed deployment guide (README_opengauss.md) describing environment variable setup, Docker Compose usage, image building, service validation, and UI access for CodeGen with openGauss on Intel Xeon.
Introduced a Docker Compose file (compose_opengauss.yaml) that defines all necessary services (openGauss DB, dataprep, embedding, retriever, LLM, backend, and UI), with health checks, environment variables, and port mappings for seamless orchestration.
Added an environment setup script (set_env_opengauss.sh) to export all required variables for service interconnection, model configuration, and proxy/network settings.

Testing and Validation:

Added an automated test script (test_compose_opengauss_on_xeon.sh) that builds images, launches services, and validates the functionality of each microservice and the overall pipeline via API calls and response checks.

Signed-off-by: sunshuang1866 <[email protected]>

for more information, see https://pre-commit.ci

github-actions · 2026-01-28T10:11:43Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

Copilot

Pull request overview

This pull request adds comprehensive support for deploying the CodeGen application with openGauss as the vector database on Intel Xeon CPUs. It provides a complete deployment solution including Docker Compose orchestration, environment configuration, testing, and documentation.

Changes:

Adds Docker Compose configuration (compose_opengauss.yaml) for orchestrating CodeGen services with openGauss database
Introduces environment setup script (set_env_opengauss.sh) for configuring all service interconnections and variables
Adds deployment documentation (README_opengauss.md) with setup instructions, validation steps, and UI access guide
Includes automated test script (test_compose_opengauss_on_xeon.sh) to validate the full deployment pipeline

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 20 comments.

File	Description
CodeGen/tests/test_compose_opengauss_on_xeon.sh	Automated test script that builds images, launches services, and validates each microservice and the complete pipeline via API calls
CodeGen/docker_compose/intel/set_env_opengauss.sh	Environment setup script that exports all required variables for openGauss configuration, service endpoints, model settings, and proxy configuration
CodeGen/docker_compose/intel/cpu/xeon/compose_opengauss.yaml	Docker Compose configuration defining all services (openGauss DB, dataprep, embedding, retriever, LLM, backend, UI) with health checks, dependencies, and port mappings
CodeGen/docker_compose/intel/cpu/xeon/README_opengauss.md	Comprehensive deployment guide covering environment setup, Docker Compose usage, image building, service validation, and UI access instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-28T10:21:24Z

CodeGen/tests/test_compose_opengauss_on_xeon.sh

+WORKPATH=$(dirname "$PWD")
+LOG_PATH="$WORKPATH/tests"
+ip_address=$(hostname -I | awk '{print $1}')
+export host_ip=$(hostname -I | awk '{print $1}')


The host_ip variable is exported but not used in this script. The script uses ip_address instead (line 16, 40, etc.). This creates inconsistency. Either remove the export host_ip line or ensure consistency in variable naming throughout the script.

Suggested change

export host_ip=$(hostname -I | awk '{print $1}')

Copilot · 2026-01-28T10:21:24Z

CodeGen/docker_compose/intel/set_env_opengauss.sh

+    echo "Error: HF_TOKEN is not set. Please set HF_TOKEN"
+fi
+
+if [ -z "${HOST_IP}" ]; then
+    echo "Error: HOST_IP is not set. Please set HOST_IP first."


The error messages for missing HF_TOKEN and HOST_IP (lines 16 and 20) only print warnings but do not exit the script or prevent execution. This allows the script to continue with undefined variables, which will cause failures later. Consider adding 'exit 1' after each error message, or use 'set -u' at the beginning of the script to fail on undefined variables.

Suggested change

echo "Error: HF_TOKEN is not set. Please set HF_TOKEN"

fi

if [ -z "${HOST_IP}" ]; then

echo "Error: HOST_IP is not set. Please set HOST_IP first."

echo "Error: HF_TOKEN is not set. Please set HF_TOKEN"

exit 1

fi

if [ -z "${HOST_IP}" ]; then

echo "Error: HOST_IP is not set. Please set HOST_IP first."

exit 1

Copilot · 2026-01-28T10:21:25Z

CodeGen/docker_compose/intel/cpu/xeon/README_opengauss.md

+3. Set up other environment variables:
+
+   ```bash


The environment variable source instruction references a relative path '../set_env_opengauss.sh' which may not work correctly depending on the user's current directory. The documentation should clarify the expected working directory or provide an absolute path based on the repository structure. Consider adding a note about needing to be in the CodeGen/docker_compose/intel/cpu/xeon directory before running this command.

Suggested change

3. Set up other environment variables:

```bash

3. Set up other environment variables (from the repository root, first change to this directory):

```bash

cd CodeGen/docker_compose/intel/cpu/xeon

Copilot · 2026-01-28T10:21:25Z

CodeGen/docker_compose/intel/cpu/xeon/README_opengauss.md

+git clone https://github.com/opea-project/GenAIComps.git
+cd GenAIComps
+```
+
+### 1. Build Retriever Image
+
+```bash
+docker build --no-cache -t opea/retriever:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/src/Dockerfile .
+```
+
+### 2. Build Dataprep Image
+
+```bash
+docker build --no-cache -t opea/dataprep:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/src/Dockerfile .
+cd ..
+```


The docker build commands reference the GenAIComps repository but don't specify where it should be cloned or what the current working directory should be. The instructions would be clearer if they specified to run these commands from within the cloned GenAIComps directory, or provided the full command sequence including directory changes.

Copilot · 2026-01-28T10:21:25Z

CodeGen/docker_compose/intel/cpu/xeon/README_opengauss.md

+   export http_proxy="Your_HTTP_Proxy"
+   export https_proxy="Your_HTTPS_Proxy"
+   # Example: no_proxy="localhost,127.0.0.1,192.168.1.1"
+   export no_proxy="Your_No_Proxy",codegen-xeon-ui-server,codegen-xeon-backend-server,dataprep-opengauss-server,tei-embedding-serving,retriever-opengauss-server,vllm-server


The no_proxy export statement has incorrect quoting (line 41). The quoted string ends with "Your_No_Proxy" and the comma-separated list of service names appears outside the quotes. This will cause the service names to be interpreted as separate shell commands. The entire value should be quoted together, or quotes should be removed entirely. For example: export no_proxy="Your_No_Proxy,codegen-xeon-ui-server,codegen-xeon-backend-server,..."

Suggested change

export no_proxy="Your_No_Proxy",codegen-xeon-ui-server,codegen-xeon-backend-server,dataprep-opengauss-server,tei-embedding-serving,retriever-opengauss-server,vllm-server

export no_proxy="Your_No_Proxy,codegen-xeon-ui-server,codegen-xeon-backend-server,dataprep-opengauss-server,tei-embedding-serving,retriever-opengauss-server,vllm-server"

Copilot · 2026-01-28T10:21:28Z

CodeGen/docker_compose/intel/cpu/xeon/compose_opengauss.yaml

+  tei-embedding-serving:
+    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.7
+    container_name: tei-embedding-serving
+    entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"


The entrypoint for tei-embedding-serving modifies the container at runtime by installing curl (line 49: 'apt-get update && apt-get install -y curl'). This adds unnecessary startup time and makes the container less reproducible. The curl installation is only needed for the healthcheck. Consider using a custom image with curl pre-installed, or use a healthcheck that doesn't require curl, such as using the native test command or wget if it's available.

Suggested change

entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"

entrypoint: /bin/sh -c "text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"

Copilot · 2026-01-28T10:21:28Z

CodeGen/tests/test_compose_opengauss_on_xeon.sh

+    # openGauss configuration
+    export GS_USER="gaussdb"
+    export GS_PASSWORD="openGauss@123"
+    export GS_DB="postgres"
+    export GS_CONNECTION_STRING="opengauss+psycopg2://${GS_USER}:${GS_PASSWORD}@$ip_address:5432/${GS_DB}"


The openGauss configuration variables (GS_USER, GS_PASSWORD, GS_DB, GS_CONNECTION_STRING) are set in the test script (lines 43-46) before sourcing set_env_opengauss.sh (line 48), but set_env_opengauss.sh exports these same variables with hardcoded values (lines 40-44 of set_env_opengauss.sh), effectively overriding the test script's values. Since the values are identical, this works but is redundant. Consider removing the duplicate exports from the test script or modifying set_env_opengauss.sh to respect pre-existing values using the pattern: export GS_USER="${GS_USER:-gaussdb}".

Suggested change

# openGauss configuration

export GS_USER="gaussdb"

export GS_PASSWORD="openGauss@123"

export GS_DB="postgres"

export GS_CONNECTION_STRING="opengauss+psycopg2://${GS_USER}:${GS_PASSWORD}@$ip_address:5432/${GS_DB}"

# openGauss configuration is handled by set_env_opengauss.sh

Copilot · 2026-01-28T10:21:29Z

CodeGen/tests/test_compose_opengauss_on_xeon.sh

+    n=0
+    until [[ "$n" -ge 100 ]]; do
+        docker logs vllm-server > ${LOG_PATH}/vllm_service_start.log 2>&1
+        if grep -q complete ${LOG_PATH}/vllm_service_start.log; then
+            break
+        fi
+        sleep 5s
+        n=$((n+1))
+    done


The script waits for the vllm-server to be ready by checking logs for the word 'complete', but the loop runs up to 100 iterations with 5-second sleeps (total 500 seconds / ~8.3 minutes). If the service fails to start within this time, the script continues without error checking. After the loop, there should be a check to verify if the service actually started successfully, and exit with an error if it didn't.

Suggested change

n=0

until [[ "$n" -ge 100 ]]; do

docker logs vllm-server > ${LOG_PATH}/vllm_service_start.log 2>&1

if grep -q complete ${LOG_PATH}/vllm_service_start.log; then

break

fi

sleep 5s

n=$((n+1))

done

n=0

vllm_ready=false

until [[ "$n" -ge 100 ]]; do

docker logs vllm-server > ${LOG_PATH}/vllm_service_start.log 2>&1

if grep -q complete ${LOG_PATH}/vllm_service_start.log; then

vllm_ready=true

break

fi

sleep 5s

n=$((n+1))

done

if [ "$vllm_ready" != true ]; then

echo "ERROR: vllm-server did not become ready within the expected time."

echo "Last vllm-server logs:"

cat ${LOG_PATH}/vllm_service_start.log || true

exit 1

fi

Copilot · 2026-01-28T10:21:29Z

CodeGen/tests/test_compose_opengauss_on_xeon.sh

+        "tei-embedding-serving" \
+        '{"inputs":"What is Deep Learning?"}'
+
+    sleep 1m # retrieval can't curl as expected, try to wait for more time


The 1-minute sleep (line 117) with a comment 'retrieval can't curl as expected, try to wait for more time' indicates a timing issue that's being worked around rather than properly fixed. This arbitrary delay makes tests slow and unreliable. Consider implementing a proper retry mechanism with health checks for the retrieval service instead of a fixed delay.

Copilot · 2026-01-28T10:21:29Z

CodeGen/docker_compose/intel/set_env_opengauss.sh

+export TEI_EMBEDDING_HOST_IP=${HOST_IP}
+export TEI_EMBEDDING_ENDPOINT="http://${HOST_IP}:${TEI_EMBEDDER_PORT}"
+
+# General Configuration


The LOGFLAG variable is set to 'false' in set_env_opengauss.sh (line 63) but is overridden to 'true' in the test script (line 39 of test_compose_opengauss_on_xeon.sh). This creates inconsistent behavior between manual deployment (using set_env_opengauss.sh directly) and automated testing. Consider documenting this difference or ensuring consistency.

Suggested change

# General Configuration

# General Configuration

# Default logging behavior for manual deployment. Automated test scripts

# (e.g., test_compose_opengauss_on_xeon.sh) may override LOGFLAG to 'true'

# to enable additional logging during testing.

sunshuang1866 and others added 5 commits November 29, 2025 16:42

add openGauss support for ChatQnA

b66700b

Signed-off-by: sunshuang1866 <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8c1aeee

for more information, see https://pre-commit.ci

Merge branch 'opea-project:main' into main

05fe5c9

Add openGauss support for CodeGen

2b0ae7f

add test compose

f4803e6

Copilot AI review requested due to automatic review settings January 28, 2026 10:11

sunshuang1866 requested review from lvliang-intel and yao531441 as code owners January 28, 2026 10:11

Copilot started reviewing on behalf of sunshuang1866 January 28, 2026 10:11 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

sunshuang1866 added 2 commits January 28, 2026 10:51

relative path has been updated

f474c52

path has been updated again

411064a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add openGauss Support for CodeGen #2410

Add openGauss Support for CodeGen #2410

sunshuang1866 commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-. Set up other environment variables:
-   ```bash
+. Set up other environment variables (from the repository root, first change to this directory):
+   ```bash
+   cd CodeGen/docker_compose/intel/cpu/xeon

	export no_proxy="Your_No_Proxy",codegen-xeon-ui-server,codegen-xeon-backend-server,dataprep-opengauss-server,tei-embedding-serving,retriever-opengauss-server,vllm-server
	export no_proxy="Your_No_Proxy,codegen-xeon-ui-server,codegen-xeon-backend-server,dataprep-opengauss-server,tei-embedding-serving,retriever-opengauss-server,vllm-server"

	entrypoint: /bin/sh -c "apt-get update && apt-get install -y curl && text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"
	entrypoint: /bin/sh -c "text-embeddings-router --json-output --model-id ${EMBEDDING_MODEL_ID} --auto-truncate"

-# General Configuration
+# General Configuration
+# Default logging behavior for manual deployment. Automated test scripts
+# (e.g., test_compose_opengauss_on_xeon.sh) may override LOGFLAG to 'true'
+# to enable additional logging during testing.

Add openGauss Support for CodeGen #2410

Are you sure you want to change the base?

Add openGauss Support for CodeGen #2410

Conversation

sunshuang1866 commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Jan 28, 2026 •

edited

Loading