Skip to content

Extend CacheRuntime phase 2.3: remount when dataset mount changed or cache runtime master restarted#5834

Open
xliuqq wants to merge 4 commits into
fluid-cloudnative:masterfrom
xliuqq:sync_remount
Open

Extend CacheRuntime phase 2.3: remount when dataset mount changed or cache runtime master restarted#5834
xliuqq wants to merge 4 commits into
fluid-cloudnative:masterfrom
xliuqq:sync_remount

Conversation

@xliuqq
Copy link
Copy Markdown
Collaborator

@xliuqq xliuqq commented May 6, 2026

Ⅰ. Describe what this PR does

remount when dataset mount changed or cache runtime master restarted

Ⅱ. Does this pull request fix one issue?

part of #5412

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

unit test for Sync method.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Signed-off-by: xliuqq <xlzq1992@gmail.com>

update openapi and crd

Signed-off-by: xliuqq <xlzq1992@gmail.com>

添加脚本更新,以及说明文档

Signed-off-by: xliuqq <xlzq1992@gmail.com>

fix test

Signed-off-by: xliuqq <xlzq1992@gmail.com>

add test
@fluid-e2e-bot
Copy link
Copy Markdown

fluid-e2e-bot Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yangyuliufeng for approval by writing /assign @yangyuliufeng in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements dynamic UFS mount updates for the generic cache runtime by executing a MountUFS script and parsing its JSON output to synchronize dataset status. The changes include API updates to CacheRuntimeStatus, new documentation for integration requirements, and the core logic in the engine to handle UFS changes and remounting. Review feedback identifies a high-severity issue where missing updates to dataset.Status.Mounts could cause infinite reconciliation loops. Other suggestions focus on making JSON parsing more robust, improving logging conciseness by avoiding full object serialization, and fixing typos in API documentation.

Comment thread pkg/ddc/cache/engine/ufs.go
Comment thread api/v1alpha1/cacheruntime_status.go Outdated
Comment thread pkg/ddc/cache/engine/ufs.go
Comment thread pkg/ddc/cache/engine/ufs.go Outdated
Comment thread pkg/ddc/cache/engine/ufs.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the generic CacheRuntime “MountUFS” integration so the cache runtime can remount dynamically when Dataset.spec.mounts changes or when the master pod restarts, using a MountUFS script whose stdout is parsed as JSON to sync mount state.

Changes:

  • Add CacheEngine UFS change detection + remount-on-master-restart logic, driven from CacheEngine.Sync().
  • Introduce/propagate status.mountTime and related status/CRD updates to track the last successful mount time.
  • Update Curvine e2e MountUFS script + docs to require MountUFS stdout to be JSON matching CacheRuntimeMountUfsOutput, and add Sync unit tests.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/gha-e2e/curvine/mount.yaml Adjust MountUFS script to be quiet on stdout and emit {"mounted":[...]} JSON for Fluid parsing.
pkg/ddc/cache/engine/ufs.go Implement UFS delta detection, remount checks on master restart, MountUFS stdout JSON parsing, and dataset mount sync.
pkg/ddc/cache/engine/sync.go Invoke UFS-change handling as part of CacheEngine Sync flow.
pkg/ddc/cache/engine/sync_test.go Add Ginkgo unit tests covering Sync/configmap generation scenarios (currently limited coverage of new UFS path).
pkg/ddc/cache/engine/setup.go Update Setup to call new PrepareUFS(runtimeClass) signature.
pkg/ddc/cache/engine/runtime.go Add updateMountTime() helper to persist last mount timestamp in CacheRuntime status.
pkg/ddc/cache/engine/master.go Change master pod/container resolution to derive container name from CacheRuntimeClass template.
pkg/ddc/cache/engine/fileutils.go Return Mount command stdout so MountUFS output can be parsed by controller logic.
pkg/ddc/cache/engine/dataset.go Update runtime MountTime when binding dataset.
docs/zh/dev/generic_cache_runtime_integration.md Document MountUFS JSON stdout contract and usage scenarios (Step 2.7).
docs/en/dev/generic_cache_runtime_integration.md English version of MountUFS JSON stdout contract and guidance (Step 2.7).
config/crd/bases/data.fluid.io_cacheruntimes.yaml CRD schema updates for status.mountTime + status.mounts shape.
charts/fluid/fluid/crds/data.fluid.io_cacheruntimes.yaml Helm CRD schema mirror of the same status changes.
api/v1alpha1/zz_generated.deepcopy.go Deepcopy updates for new/changed status fields and MountUFS output type.
api/v1alpha1/status.go Update RuntimeStatus comments around mounts (minor doc tweak).
api/v1alpha1/common.go Add CacheRuntimeMountUfsOutput type definition.
api/v1alpha1/cacheruntimeclass_types.go Update MountUFS comment to indicate JSON output requirement.
api/v1alpha1/cacheruntime_status.go Replace older mount status struct with Mounts + MountTime fields on CacheRuntime status.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +35 to +36
[ -z "$mountPoint" ] && { echo "mountPoint is not set or empty" >&2; exit 1; }
[ -z "$path" ] && { echo "path is not set or empty" >&2; exit 1; }
Comment thread pkg/ddc/cache/engine/ufs.go Outdated
Comment thread api/v1alpha1/cacheruntime_status.go Outdated
Comment thread api/v1alpha1/cacheruntimeclass_types.go Outdated
Comment on lines +226 to +231
err := engine.Sync(ctx)
// Sync may fail at UpdateOnUFSChange due to missing master pod, but ConfigMap should be created

// Verify ConfigMap was created
configMap := &corev1.ConfigMap{}
err = fakeClient.Get(context.Background(), types.NamespacedName{
Comment thread pkg/ddc/cache/engine/sync.go
Comment thread pkg/ddc/cache/engine/sync_test.go
Comment thread pkg/ddc/cache/engine/sync_test.go
Comment thread pkg/ddc/cache/engine/sync_test.go
Comment thread pkg/ddc/cache/engine/sync_test.go
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 61.66667% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.64%. Comparing base (42e7392) to head (c1c46e2).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
pkg/ddc/cache/engine/ufs.go 65.00% 37 Missing and 12 partials ⚠️
pkg/ddc/cache/engine/runtime.go 52.63% 6 Missing and 3 partials ⚠️
pkg/ddc/cache/engine/dataset.go 0.00% 5 Missing ⚠️
pkg/ddc/cache/engine/sync.go 20.00% 3 Missing and 1 partial ⚠️
pkg/ddc/cache/engine/setup.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5834      +/-   ##
==========================================
+ Coverage   59.10%   59.64%   +0.53%     
==========================================
  Files         480      480              
  Lines       32511    32668     +157     
==========================================
+ Hits        19215    19484     +269     
+ Misses      11746    11607     -139     
- Partials     1550     1577      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cheyang
Copy link
Copy Markdown
Collaborator

cheyang commented May 7, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for dynamic UFS mount updates and status synchronization within the generic cache runtime. Key changes include refactoring CacheRuntimeStatus to simplify mount tracking, implementing logic to detect dataset mount changes or master pod restarts, and requiring mount scripts to output status in a specific JSON format (CacheRuntimeMountUfsOutput). Comprehensive unit tests and documentation updates were also provided. Review feedback highlights opportunities to optimize performance by passing the CacheRuntime object through the call stack to avoid redundant API server requests in the UpdateOnUFSChange, shouldUpdateUFS, and checkIfRemountRequired methods.

Comment thread pkg/ddc/cache/engine/ufs.go Outdated
Comment thread pkg/ddc/cache/engine/ufs.go Outdated
Comment thread pkg/ddc/cache/engine/ufs.go Outdated
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 19 changed files in this pull request and generated 9 comments.

Files not reviewed (2)
  • api/v1alpha1/openapi_generated.go: Language not supported
  • api/v1alpha1/zz_generated.deepcopy.go: Language not supported

Comment thread test/gha-e2e/curvine/mount.yaml
Comment thread docs/en/dev/generic_cache_runtime_integration.md Outdated
Comment thread docs/zh/dev/generic_cache_runtime_integration.md Outdated
Comment thread pkg/ddc/cache/engine/ufs.go
Comment thread pkg/ddc/cache/engine/ufs.go Outdated
Comment thread pkg/ddc/cache/engine/runtime.go
Comment thread api/v1alpha1/cacheruntime_status.go Outdated
Comment thread pkg/ddc/cache/engine/sync_test.go Outdated
Comment on lines +43 to +48
// handle ufs change - support dynamic mount updates
err = e.updateOnUFSChange(runtime)
if err != nil {
e.Log.Error(err, "Failed to update UFS")
return err
}
@cheyang
Copy link
Copy Markdown
Collaborator

cheyang commented May 12, 2026

Review of PR #5834 — CacheRuntime phase 2.3: dynamic remount on dataset mount change or master restart.

Design direction is sound and the core chain (shouldUpdateUFS → PrepareUFS → SyncDatasetMounts) follows existing Fluid patterns. Two blocking issues need fixing before merge:

  1. Infinite reconcile loop risk: when updateOnUFSChange fails mid-way (e.g., JSON parse error), dataset stays in Updating phase with stale status.Mounts, causing shouldUpdateUFS to always return true on re-entry. Need a failure-path phase reset or backoff.
  2. updateMountTime silently swallows errors: if MountTime status update fails, checkIfRemountRequired will keep triggering unnecessary remounts every cycle.

Non-blocking: redundant runtimeClass fetch, checkIfRemountRequired error handling, config map timing, comment fixes. Test coverage for the new UFS path is critically low (4.67% patch coverage) — strongly recommend adding targeted unit tests before merge.

}

// 2. set update status to updating
err = utils.UpdateMountStatus(e.Client, e.name, e.namespace, datav1alpha1.UpdatingDatasetPhase)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When updateOnUFSChange sets the dataset phase to Updating, the UpdateMountStatus Updating path does NOT update dataset.Status.Mounts (only phase + condition). If the subsequent MountUFS execution or SyncDatasetMounts fails, the reconciler will re-enter Sync → updateOnUFSChange → shouldUpdateUFS. Since AnalyzePathsDelta compares spec.Mounts vs status.Mounts and the status.Mounts remain stale, shouldUpdateUFS will always return true, causing an infinite reconcile loop with repeated MountUFS invocations.

Consider adding a failure-path phase reset (e.g., revert to Bound on error) or a backoff/retry counter to break the cycle when the dataset gets stuck in Updating.

Comment thread pkg/ddc/cache/engine/runtime.go
}

return restart
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateOnUFSChange fetches runtimeClass via e.getRuntimeClass even though it could be passed in from the caller. Passing runtimeClass as a parameter would eliminate the redundant API call and reduce potential race conditions between multiple fetches of the same object.

Comment thread pkg/ddc/cache/engine/ufs.go
Comment thread pkg/ddc/cache/engine/ufs.go
xliuqq added 2 commits May 12, 2026 23:07
Signed-off-by: xliuqq <xlzq1992@gmail.com>
Signed-off-by: xliuqq <xlzq1992@gmail.com>
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants