Skip to content

TortuosityMLMG: recover linear scaling on porous geometry (alpha*a pin breaks MG coarsening) #289

@jameslehoux

Description

@jameslehoux

Summary

TortuosityMLMG no longer scales linearly on real porous microstructures. The bake-off in notebooks/profiling_and_tuning.ipynb and the python/tests/test_mlmg_porespy.py regression now return correct answers but with super-linear iteration growth, undermining the "MLMG is the default because it's the only O(N) option" rationale used to make MLMG the default solver (v4.3.0).

Background — how we got here

Three sequential issues, all fixed on branch claude/debug-mlmg-solver-GxIT6master:

  1. Original bug ([resolved]): TortuositySolverBase::generateActivityMask marks dead-end and isolated phase-target cells as non-percolating. TortuosityHypre decouples those rows in the matrix (A_ii=1, A_ij=0, rhs=0 at src/props/TortuosityHypre.cpp:1100). TortuosityMLMG did not — MLABecLaplacian's B-coefficient field was built straight from m_mf_diff_coeff (which is 1 on every phase-target cell) with no mask awareness. Non-percolating islands then formed Neumann subdomains with no Dirichlet contact; MLMG drove the local residual to zero but those potentials were indeterminate, breaking the boundary flux balance that TortuositySolverBase::value() audits → "converged but produced an invalid result" NaN on every porespy run.

  2. Pin-induced silent abort ([resolved]): pinning inactive cells via setScalars(1.0, 1.0) + acoef = 1 on inactive cells + harmonic-mean B-faces zeroed at active/inactive interfaces gave a well-posed operator, but MLMG default behaviour on non-convergence is amrex::Abort (SIGABRT), not throw. The existing try { mlmg.solve(...) } catch (std::exception&) did nothing useful → kernel died silently. Fixed by mlmg.setThrowException(true).

  3. Unreachable tolerance ([resolved]): eps = 1e-11 default was chosen to keep the integrated per-cell residual under the 1e-4 boundary-flux conservation tolerance. With the pin, MLMG stalls around 1e-5 (~5% reduction per V-cycle) and never gets to 1e-11. Loosened to 1e-7 with maxiter = 400 — still three orders below the flux tolerance, so correctness unaffected.

The remaining problem

After (1)–(3), MLMG on porespy 64³ runs to completion with a correct answer, but: the V-cycles only give ~5% residual reduction each, not the order-of-magnitude per cycle that healthy geometric multigrid produces. Iteration count grows with problem size — this is the super-linear scaling we now see in §4 of the profiling notebook on heterogeneous data.

Root cause is structural, not a bug we can patch around: MLABecLaplacian's geometric coarsening averages the a-coefficient and B-faces across fine cells per coarse cell. A coarse cell covering a mix of active (a=0) and inactive (a=1) fine cells gets a partial pin (a≈0.5) and partially-zeroed B-faces. That coarse operator does not faithfully represent the restriction of the fine operator's action on the active subdomain. V-cycle correction therefore contains components that the fine smoother cannot fix, and convergence stagnates.

This is the same reason pcg+pfmg / gmres+pfmg fail in the bake-off: HYPRE's structured PFMG semicoarsens through the same masked rows and stalls.

What scales where (post-fix, current behaviour)

Geometry MLMG outcome Iter scaling
Uniform / fully connected (np.zeros) Correct, fast Near-linear (the §4 plot, with warm-up point trimmed) — a=0 everywhere, pin never activates, coarsening clean
Porespy blobs / real microstructure Correct Super-linear — pin activates, coarsening degraded, 5%/iter

Proposed fix — EB-based MLMG (Phase 1)

Migrate TortuosityMLMG to MLEBABecLaplacian with the active mask encoded as embedded-boundary metadata.

AMReX's EB framework is designed for exactly this case: solid obstacles inside a fluid domain. Each cell carries a volume fraction vfrac (1 = fully open, 0 = fully covered) and each face an aperture (apx, apy, apz). The MG hierarchy is built with EB-aware restriction/prolongation: coarse cells track the union of their fine cells' coverage, and the EB-aware stencil at irregular cells is automatically derived from the apertures. No alpha*a pin needed — covered cells are simply absent from the system.

Why this should restore linear scaling:

  • Inactive (non-percolating) cells get vfrac = 0, are removed from the operator entirely
  • Active/inactive interface faces get aperture = 0, decoupling the two regions geometrically
  • MG coarsening preserves EB metadata correctly across levels — coarse-grid operators stay consistent with their fine counterparts
  • This is the AMReX-native analogue of HYPRE's row-decoupling, just with the bookkeeping done geometrically rather than algebraically

Why it's tractable for us:

  • AMReX is already built with -DAMReX_EB=ON in both containers/Dockerfile:93 and containers/Singularity.deps.def:176
  • The active mask we already compute in TortuositySolverBase::generateActivityMask (parallel flood fill from inlet & outlet seeds) is exactly the input EB needs — cell_activevfrac = 1.0, cell_inactivevfrac = 0.0
  • The EB2::Build + makeShop path with a custom implicit function (ActiveMaskIF) works without needing EB2::IF_* analytic shapes

Implementation (in progress on claude/issue-289-mlmg-eb-migration):

  • ActiveMaskIF struct converts the active mask into an EB implicit function (negative = fluid/active, positive = body/inactive)
  • EB2::Build(makeShop(if_obj), geom, ...)EBFArrayBoxFactoryMLEBABecLap
  • setScalars(0.0, 1.0) — pure Laplacian, no alpha*a pin
  • EB default Neumann BC on the body surface (no flux into solid)
  • Domain-boundary ghost cells of m_mf_diff_coeff filled by extrapolation before B-coefficient computation (MLEBABecLap uses user B at boundary faces, unlike MLABecLaplacian which overrides internally)
  • EB2::IndexSpace::pop() at end of solve() to avoid leaking EB metadata
  • mlmg.setThrowException(true) retained for graceful failure
  • eps restored to 1e-11, maxiter to 200

Early CI results (uniform 32³): MLMG converges in 10 iterations with ~14x reduction per V-cycle — healthy MG behaviour restored. Boundary flux conservation was initially failing due to uninitialised ghost D at domain boundaries; fixed by explicit ghost-cell extrapolation.

Relationship to #69 (Sub-Voxel Accuracy via EB)

This issue implements Phase 1 infrastructure from #69. The same EB2 → EBFArrayBoxFactory → MLEBABecLap → MLMG pipeline serves both:

This issue (#289) #69 Phase 2
Binary mask ActiveMaskIF (returns ±1) Continuous SDF SDFImplicitFunction (returns signed distance)
Every cell is fully regular or fully covered Interface cells are cut cells with 0 < vfrac < 1
Recovers MG scaling by removing inactive cells Adds sub-voxel geometric accuracy at interfaces

To go from #289#69, the only change is which implicit function is passed to makeShop. The rest of the pipeline stays identical.

Sub-voxel accuracy strategy (revised from #69 discussion)

The original #69 proposal focused on ingesting ML probability maps. After analysis, a revised strategy acknowledges that OpenImpala's primary input is binary segmented XCT data, and the approaches for improving accuracy on that data differ significantly in robustness:

Approaches considered, ranked by robustness on heterogeneous 3D microstructures:

Approach Accuracy Effort Robustness Notes
Richardson extrapolation on binary grid Best (rigorous) Low Excellent Solve at N³, 2N³, 4N³ and extrapolate. O(dx) staircase error has known convergence rate. Quantifies discretisation error as a byproduct. Cost: 3–4 solves at increasing resolution.
Composite voxel (upstream provides phase fractions) Good Trivial Excellent Set D_cell = phase_fraction directly. No EB needed. Requires upstream pipeline to provide partial-volume data instead of binary thresholding.
Effective-medium face coefficients (Schneider et al. 2019) Good Medium Good Smooth the operator, not the geometry. Face D derived from local neighbourhood porosity. Works on existing solver path.
Anti-aliased SDF → EB (Frisken et al. 2000) Good Medium-high Moderate Analyse 2x2x2 boundary neighbourhoods to estimate sub-voxel interface position. Better than naive distance transform for corners/thin features.
Naive SDF from distance_transform_edt → EB Variable Low Poor Rounds all corners to circles/spheres, can't distinguish real features from staircase artifacts, systematically distorts thin features (< 5 voxels).

Key insight: the naive SDF-from-binary approach (compute Euclidean distance transform, feed to EB) gives false confidence on heterogeneous data. The staircase error from binary voxels is well-characterised and converges as O(dx); the SDF-smoothed error is geometry-dependent and harder to quantify. For highly heterogeneous media (porespy blobs, real battery electrodes), the smoothing can actually make results worse by rounding away real microstructural features.

Recommended strategy:

  1. Phase 1 (this issue, TortuosityMLMG: recover linear scaling on porous geometry (alpha*a pin breaks MG coarsening) #289): EB with binary mask — proves the EB2 → MLEBABecLap pipeline, handles inactive-cell masking correctly, recovers MG scaling. No sub-voxel accuracy claim.

  2. Phase 2 (Feature Request: Sub-Voxel Accuracy via AMReX Embedded Boundaries (EB) #69): Accept an optional user-provided SDF/level-set (level_set= kwarg). Don't auto-compute from binary data. The user decides whether their upstream pipeline (ML segmentation, grayscale-to-SDF, partial-volume CT) produces something worth smoothing with. Document clearly: "For genuine sub-voxel accuracy, provide an SDF derived from the original grayscale or ML probability field, not from a distance transform on binary data."

  3. Don't implement auto-smoothing (smooth_eb=True). It would give users false confidence on heterogeneous data.

  4. For highest-accuracy validation on binary data: use Richardson extrapolation (solve at multiple resolutions, extrapolate). No geometric assumptions, quantifies discretisation error, works on any microstructure.

Implementation plan

src/props/TortuosityMLMG.{H,cpp}                    [done - on branch]
  - Replace MLABecLaplacian with MLEBABecLap
  - ActiveMaskIF implicit function from m_mf_active_mask
  - EB2::Build → EBFArrayBoxFactory → MLEBABecLap pipeline
  - Domain-boundary ghost-cell extrapolation for m_mf_diff_coeff
  - Drop alpha*a pin, drop dc_masked workaround
  - Restore eps=1e-11, maxiter=200

python/bindings/solvers.cpp                          [done - on branch]
  - Updated default eps/maxiter to match header

src/props/TortuositySolverBase.{H,cpp}               [optional]
  - Expose m_mf_active_mask for other solvers to adopt EB pattern

tests/tTortuosityMLMG.cpp                            [TODO]
  - Add non-uniform geometry test case

python/tests/test_mlmg_porespy.py                    [TODO]
  - Add iteration-count regression (≤20 V-cycles at 32³)

notebooks/profiling_and_tuning.ipynb                 [TODO]
  - Update §4 to validate scaling on porespy blobs, not np.zeros

Backup paths (lesser options)

  • Custom MLLinOp subclass carrying the active mask natively, overriding Fapply / Fsmooth / applyBC / restriction. Equivalent expressive power to EB but considerably more code and more chances to get the restriction wrong. Only worth considering if EB turns out to have a blocker we don't anticipate.

  • Wrap HYPRE BoomerAMG (algebraic multigrid). Linear scaling guaranteed on M-matrices regardless of mask. But: loses the matrix-free / low-memory property that justifies MLMG as the default in the first place, requires IJ-matrix assembly (we currently use the Struct interface), and adds AMG setup cost that's amortised poorly on small grids. Useful as a fallback for cases where EB can't handle the geometry (multi-phase D > 0 with high contrast?), but not the right primary path.

  • Compactify the active subdomain into a smaller BoxArray. Conceptually clean but the active region of porespy is highly irregular — load balancing collapses, and setLevelBC on a non-cuboidal domain isn't a thing in MLABecLaplacian. Reject.

Acceptance criteria

  1. MLMG reaches eps = 1e-11 in ≤ 20 V-cycles on porous_blobs (32³, ~50% porosity).
  2. Iteration count is bounded by a constant (say ≤ 30) across 32³, 64³, 96³, 128³, 256³ for porespy data. The §4 scaling fit on porous data gives p < 1.2.
  3. test_mlmg_matches_hypre_within_one_percent still passes.
  4. Bake-off (notebooks/profiling_and_tuning.ipynb §3) shows MLMG as the fastest solver on 64³ porespy, recovering the original "default solver since v4.3.0" claim.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions