Skip to content

perf(docker): improve all images build cache efficiency#3025

Merged
imbajin merged 8 commits into
apache:masterfrom
bitflicker64:improve-docker-build-cache
May 18, 2026
Merged

perf(docker): improve all images build cache efficiency#3025
imbajin merged 8 commits into
apache:masterfrom
bitflicker64:improve-docker-build-cache

Conversation

@bitflicker64
Copy link
Copy Markdown
Contributor

@bitflicker64 bitflicker64 commented May 13, 2026

Purpose of the PR

close #2977

Main Changes

.dockerignore (new file)

  • Excludes Docker build-context noise explicitly because .dockerignore does not inherit .gitignore
  • Filters local build outputs, extracted release directories, archives, IDE/OS files, logs, generated artifacts, env-local files, git internals, compose files, and docs

All 4 Dockerfiles

  • Added --mount=type=cache,target=/root/.m2 on mvn package — deps cached across repeated builds
  • Kept vim and cron in the runtime images to preserve existing container debugging workflows and the current start-hugegraph.sh -m true monitor path
  • Removed ineffective build-time service cron start, rm /var/lib/dpkg/info/libc-bin.*, and other dead cleanup/debug commands from runtime stages

Note: dependency:go-offline not used — inter-module deps like hg-pd-client are not on Maven Central and would cause it to fail.

Verifying these changes

  • Need tests and can be verified as follows:
# Build all 4 images
docker build -f hugegraph-pd/Dockerfile        -t hugegraph/pd:test .  
docker build -f hugegraph-store/Dockerfile     -t hugegraph/store:test .  
docker build -f hugegraph-server/Dockerfile    -t hugegraph/server:test .  
docker build -f hugegraph-server/Dockerfile-hstore -t hugegraph/server-hstore:test .  
  
# Verify runtime contents and cron in each image
docker run --rm --entrypoint /bin/bash hugegraph/pd:test    -c "ls /hugegraph-pd/    && which cron"  
docker run --rm --entrypoint /bin/bash hugegraph/store:test -c "ls /hugegraph-store/ && which cron"  
docker run --rm --entrypoint /bin/bash hugegraph/server:test -c "ls /hugegraph-server/ && which cron"  
  
# Full stack health check (pull_policy: always requires --pull never for local images)
HUGEGRAPH_VERSION=test docker compose -f docker/docker-compose.yml up --pull never  
# All 3 containers should show (healthy) in docker ps 
- [x] Tested and verified:
  - All 4 Dockerfiles build successfully (Maven deps cached via `--mount=type=cache`)
  - Runtime contents and `cron` confirmed in all images
  - Full stack health check passed — all 3 containers came up `(healthy)`:
    - `hugegraph/pd:test` — healthy on port 8620
    - `hugegraph/store:test` — healthy on port 8520
    - `hugegraph/server:test` — healthy on port 8080

Does this PR potentially affect the following parts?

  • Dependencies (add/update license info & regenerate_known_dependencies.sh)
  • Modify configurations
  • The public API
  • Other affects (typed here)
  • Nope

Documentation Status

  • Doc - No Need

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. ci-cd Build or deploy perf labels May 13, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 1.56%. Comparing base (e108076) to head (0952764).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (e108076) and HEAD (0952764). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (e108076) HEAD (0952764)
3 2
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #3025       +/-   ##
============================================
- Coverage     35.85%   1.56%   -34.30%     
+ Complexity      338      43      -295     
============================================
  Files           802     781       -21     
  Lines         67995   65524     -2471     
  Branches       8902    8457      -445     
============================================
- Hits          24381    1026    -23355     
- Misses        41008   64412    +23404     
+ Partials       2606      86     -2520     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve Docker build cache efficiency for HugeGraph PD, Store, and Server images by adding Docker context filtering and BuildKit/Maven cache-related Dockerfile changes.

Changes:

  • Adds a new root .dockerignore.
  • Updates four Dockerfiles to use a pom-only copy layer and Maven cache mounts.
  • Removes some runtime packages and cleanup/startup commands from image build stages.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
.dockerignore Adds Docker build-context exclusions.
hugegraph-pd/Dockerfile Refactors PD image build layers and runtime dependency install.
hugegraph-store/Dockerfile Refactors Store image build layers and runtime dependency install.
hugegraph-server/Dockerfile Refactors Server image build layers and runtime dependency install.
hugegraph-server/Dockerfile-hstore Refactors hstore Server image build layers and runtime dependency install.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hugegraph-pd/Dockerfile Outdated
Comment thread hugegraph-store/Dockerfile Outdated
Comment thread hugegraph-server/Dockerfile Outdated
Comment thread hugegraph-server/Dockerfile-hstore Outdated
Comment thread hugegraph-pd/Dockerfile Outdated
Comment thread .dockerignore Outdated
Comment thread hugegraph-pd/Dockerfile Outdated
Comment thread hugegraph-store/Dockerfile Outdated
Comment thread hugegraph-server/Dockerfile Outdated
Comment thread hugegraph-server/Dockerfile-hstore Outdated
- Add .dockerignore to reduce build context noise (defer to .gitignore,
  add **/target/ and Docker-specific extras explicitly)
- Refactor all 4 Dockerfiles: pom-first COPY layer + BuildKit .m2 cache mount
- Add # syntax=docker/dockerfile:1 for compatibility with older Docker versions
- Restore cron in runtime stage to preserve optional monitor support
- Remove dead code: vim, service cron start, rm libc-bin hack, set -x, pwd&&cd
- Update server-ci.yml ubuntu-22.04 -> ubuntu-latest (resolves existing TODO)
@bitflicker64 bitflicker64 requested a review from Copilot May 14, 2026 12:33
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction is right but a few blockers remain. See inline for net-new findings; +1'd Copilot's threads on the labs syntax / pom-only layer / root-pom glob (those are correct). Body-level: README/BUILDING should also note the new BuildKit ≥ 0.20 requirement, and removing vim is a behavioral change worth calling out in the commit body for ops folks who exec into containers.

Comment thread .dockerignore Outdated
# NOTE: This file intentionally stays minimal.
**/target/
# Most patterns (IDE files, build artifacts, logs, OS files) are already
# covered by .gitignore. Only Docker-specific extras are listed here.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ The premise here is incorrect — Docker's build context exclusion does not inherit .gitignore. As a result, locally-extracted release dirs (apache-hugegraph-*/, measured at ~790 MB on a normal dev checkout), .idea/, node_modules/, logs/, *.iml, *.class, gen-java/, upload-files/, .vscode/, *.tar.gz, dist/, build/ all leak into every build's context, defeating the cache work in this PR (and risking IDE / secret leakage into image layers).

Suggested change
# covered by .gitignore. Only Docker-specific extras are listed here.
**/target/
# IMPORTANT: .dockerignore does NOT inherit .gitignore — patterns must be restated.
# Pre-extracted release dirs / archives
apache-hugegraph-*/
**/*.tar
**/*.tar.gz*
**/*.zip
**/*.war
# IDE / OS
.idea/
.vscode/
**/*.iml
**/*.iws
**/.DS_Store
# Build / runtime artifacts
**/logs/
**/*.log
**/*.class
**/gen-java/
**/upload-files/
**/dist/
**/build/
**/node_modules/
# Env files
.env.local
.env.*.local
# Git internals
.git
.gitignore
.gitattributes
.github
# Compose / docs not needed in build context
**/docker-compose*.yml
**/docker-compose*.yaml
**/*.md
docs/

Comment thread .github/workflows/server-ci.yml Outdated
build-server:
# TODO: we need test & replace it to ubuntu-24.04 or ubuntu-latest
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Two concerns on this line:

  1. Scope creep: bumping ubuntu-22.04 → ubuntu-latest is unrelated to the Docker cache work in this PR. The original # TODO was a deliberate guard (Ubuntu 24.04 changed default Java, removed legacy packages). Suggest reverting in this PR and splitting into a focused follow-up.
  2. ubuntu-latest drifts when GitHub flips the alias — pin explicitly:
Suggested change
runs-on: ubuntu-latest
runs-on: ubuntu-24.04

Flagging here since it's the only CI file touched: there is no CI workflow that exercises any Dockerfile (verified — zero docker build references under .github/workflows/). PR's verification is "run docker build locally". Future regressions will ship silently. Consider adding a build-only smoke job:

docker-build:
  runs-on: ubuntu-24.04
  strategy:
    matrix:
      dockerfile:
        - hugegraph-pd/Dockerfile
        - hugegraph-store/Dockerfile
        - hugegraph-server/Dockerfile
        - hugegraph-server/Dockerfile-hstore
  steps:
    - uses: actions/checkout@v4
    - uses: docker/setup-buildx-action@v3
    - run: docker buildx build -f ${{ matrix.dockerfile }} --load .

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted it back , I was mainly testing what would break with ubuntu-latest. Also added smoke test 🫡

dumb-init \
procps \
curl \
lsof \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Pre-existing issue (not a regression from this PR, but worth fixing while in the area): the cron package stays installed in the runtime image (~3 MB), but no cron daemon is ever started — dumb-init only execs docker-entrypoint.sh. This means start-hugegraph.sh -m true silently fails to fire its scheduled monitor job, even though crontab_append returns success. Either:

  • Drop cron entirely (saves ~3 MB and removes the false impression that monitor mode works), or
  • Start cron in docker-entrypoint.sh (cron & before the server start) so the feature actually works.

Same applies to Dockerfile-hstore and the PD / Store Dockerfiles.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed cron from the runtime images for now since properly enabling it would expand the scope of this PR. happy to work on enabling this feature later in a follow up pr if u want

- remove ineffective pom-only COPY optimization
- remove unused cron package from runtime images
- keep vim for container debugging workflows
- add Maven cache mounts for faster rebuilds
- expand .dockerignore to exclude IDE files, archives, logs, generated artifacts, and local build outputs
- simplify Docker build flow and reduce unnecessary build context size
@bitflicker64
Copy link
Copy Markdown
Contributor Author

**/dist/matches org/apache/hugegraph/dist/ a real Java source package directory.

The full path that gets excluded is:

hugegraph-server/hugegraph-dist/src/main/java/org/apache/hugegraph/dist/ 

The last path component is dist, so **/dist/ matches it and wipes the entire org.apache.hugegraph.dist package from the Docker build context before mvn package ever runs. so used **/target/dist/ instead

@bitflicker64 bitflicker64 requested a review from imbajin May 16, 2026 06:57
@imbajin
Copy link
Copy Markdown
Member

imbajin commented May 17, 2026

Thanks for iterating on the Docker cache changes. The cache direction looks good, and besides the monitor/runtime compatibility point below plus the small .dockerignore typo, I don't see other blockers from my side.

I think we should keep the runtime behavior unchanged in this PR.

Today the image still exposes the legacy monitor path:

start-hugegraph.sh -m true
        -> start-monitor.sh
        -> crontab registers monitor-hugegraph.sh every minute
        -> monitor-hugegraph.sh checks process + /versions
        -> restart HugeGraphServer if the process is gone or REST is unhealthy

After removing cron, that path is broken:

start-hugegraph.sh -m true
        -> start-monitor.sh
        -> crontab: command not found
        -> monitor is not registered

I agree that the crontab-based monitor is not the ideal long-term model for Docker. For container users, the better direction is probably a Docker-native lifecycle model:

docker-entrypoint.sh
        -> start HugeGraphServer
        -> watch server pid and optionally probe /versions
        -> exit the container when the server dies or stays unhealthy
        -> Docker restart policy restarts the container

That can then be paired with:

docker run --restart unless-stopped ...
Dockerfile HEALTHCHECK ...

But this is a runtime behavior/design change, not just a build-cache optimization. It also needs documentation, because standalone Docker healthcheck only marks the container as unhealthy and does not restart it by itself; users still need a restart policy, or an entrypoint watchdog that exits the container after repeated failures.

So for this PR, I suggest the lowest-risk path:

  1. Keep cron in the runtime image for now, so the existing start-hugegraph.sh -m true monitor path is not broken.
  2. Remove only the build-time service cron start cleanup if needed, since starting a service during image build does not keep it running in containers.
  3. Fix the small .dockerignore typo: .gitattribut -> .gitattributes.
  4. Leave the Docker-native watchdog / healthcheck / restart-policy design to a follow-up PR with docs.

In short:

This PR:
  improve Docker build cache
  keep runtime behavior unchanged

Follow-up PR:
  replace legacy cron monitor with Docker-native watchdog / healthcheck docs

@bitflicker64
Copy link
Copy Markdown
Contributor Author

bitflicker64 commented May 17, 2026

Thanks for iterating on the Docker cache changes. The cache direction looks good, and besides the monitor/runtime compatibility point below plus the small .dockerignore typo, I don't see other blockers from my side.

Thanks for the review !!

I think we should keep the runtime behavior unchanged in this PR.

Today the image still exposes the legacy monitor path:
...........
...........
...........
But this is a runtime behavior/design change, not just a build-cache optimization. It also needs documentation, because standalone Docker healthcheck only marks the container as unhealthy and does not restart it by itself; users still need a restart policy, or an entrypoint watchdog that exits the container after repeated failures.

Makess. sensee

So for this PR, I suggest the lowest-risk path:

  1. Keep cron in the runtime image for now, so the existing start-hugegraph.sh -m true monitor path is not broken.
  2. Remove only the build-time service cron start cleanup if needed, since starting a service during image build does not keep it running in containers.
  3. Fix the small .dockerignore typo: .gitattribut -> .gitattributes.
  4. Leave the Docker-native watchdog / healthcheck / restart-policy design to a follow-up PR with docs.

should i also remove **/target/dist/ since **/target/ is alr covered also wanted to ask about **/*.tar.gz* should we keep it **/*.tar.gz to be safee ?

In short:

This PR:
  improve Docker build cache
  keep runtime behavior unchanged

Follow-up PR:
  replace legacy cron monitor with Docker-native watchdog / healthcheck docs

ill make a follow up pr as soon as my end sems exams are overrr 🫡

- Fix .gitattribut -> .gitattributes typo in .dockerignore
- Fix **/*.tar.gz* -> **/*.tar.gz (remove unintended trailing wildcard)
- Remove **/target/dist/ (redundant, already covered by **/target/)
- Restore cron to apt-get install in all 4 Dockerfiles to keep the
  existing start-hugegraph.sh -m true monitor path working
@imbajin
Copy link
Copy Markdown
Member

imbajin commented May 17, 2026

Yes, **/target/dist/ can be removed since **/target/ already covers it.

For the archive patterns, I think it is fine to exclude local *.tar.gz / *.tgz / similar generated archives from the Docker build context. The Dockerfiles should rebuild the distribution packages inside the build stage, so the image build should not depend on pre-existing local release archives.

The only thing I would like to confirm is validation: please make sure all four Docker images still build successfully and the new images can start normally after these ignore rules are applied. If that is verified, excluding these archive files sounds good to me.

@bitflicker64
Copy link
Copy Markdown
Contributor Author

bitflicker64 commented May 17, 2026

The only thing I would like to confirm is validation: please make sure all four Docker images still build successfully and the new images can start normally after these ignore rules are applied. If that is verified, excluding these archive files sounds good to me.

validated all 4 images locally ,all build successfully and the runtime contents look correct (artifacts in place, cron installed). The .dockerignore archive patterns don't exclude anything the build actually needs since everything gets rebuilt from source inside the build stage anyway.

edit: tagged the new images and tested with compose all three containers come up healthy

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up validation. The Docker build CI is green, the four images were validated locally, and the compose runtime check confirms the containers come up healthy. I also updated the PR description to match the current implementation: vim / cron are kept, while the ineffective build-time service cron start and dead cleanup/debug commands are removed.

Looks good to me now.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 18, 2026
@imbajin imbajin changed the title perf(docker): improve pd/store/server image build cache efficiency perf(docker): improve all images build cache efficiency May 18, 2026
@imbajin imbajin merged commit 8d095e1 into apache:master May 18, 2026
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-cd Build or deploy lgtm This PR has been approved by a maintainer perf size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Task] Improve Docker build cache efficiency for pd/store/server images

3 participants