Refactored soca -> cice by shlyaeva · Pull Request #1246 · JCSDA-internal/soca

shlyaeva · 2026-05-27T20:36:27Z

Description

Sea-ice analysis postprocessing in C++/atlas (PostProcessIce): replaces the Fortran Soca2Cice variable change with a C++/atlas postprocessor that projects an aggregate SOCA ice analysis onto a per-category CICE restart. The new algorithm is largely based on @DmitryDukhovskoy's Python scripts to insert ice concentration, thickness and snow depth (https://github.com/DmitryDukhovskoy/RTOFS_utilities/blob/gaea_pub/prepare_cice6/insert_iconc_ithkn_cice6_restart.py and https://github.com/DmitryDukhovskoy/RTOFS_utilities/blob/gaea_pub/prepare_cice6/insert_hsnow_cice6_restart.py). Validated on 7 production GDAS cycles spanning different seasons.

What's new

soca::PostProcessIce in src/soca/PostProcess/,
with a single public entry point:
```
soca::State PostProcessIce::postprocess(const soca::State & analysis) const;
```
Reads the CICE background restart, applies the per-cell pass, writes the postprocessed restart in update mode, returns an aggregate-ice State {aice, hice, hsno} matching what was written. Used by the standalone soca_postproc.x app, by AnalysisPostproc.h's ensemble loop, and by gdasapp's increment handler.
Per-cell pass (see README for details): case dispatch (LAND / ICE2NOICE / NOICE2ICE / ICE2ICE), ITD rebin (on by default), aicen-weighted snow distribution, optional freeboard enforcement, mass-conserving min-vice cleanup. Pure column-physics helpers (BL99 enthalpy, BZ99 salinity profile, thickness-category solver, freeboard) live in IcePhysics.h/.cc with their own unit tests in TestIcePhysics.cc.
New-ice thermo seeding: cells that transitioned from bg_aicen=0 to new_aicen>0 get Tsfcn from the area-weighted mean of a KDTree donor with any ice (global lat/lon tree, donor data gathered once via a sparse halo exchange); sub-surface qice/sice synthesized from CICE physics at the ocean freezing point.
soca_postproc.x standalone application
(src/mains/Postproc.h): takes background + increment, forms analysis = bg + incr, runs PostProcessIce. Also accepts an explicit analysis block.
Dedicated CICE restart writer on the soca::State side:
soca::State::writeCice(cfg) calls a new Fortran entry
soca_fields::write_cice that wraps the update-mode CICE writer (byte-copy input restart → output, overwrite only modelled variables, ~40 unmodelled CICE vars pass through).
fields_metadata.yml: new <LEVEL> placeholder for templated per-layer entries (CICE per-layer restart vars qice00N, sice00N, qsno00N); combines with <CATEGORY> to expand ncat × ice_lev entries automatically. Per-cat ice io name updated to restart naming (aice<N>, vice<N>, vsno<N> - dropping the _h suffix which is the CICE history convention; restarts don't carry it).
AnalysisPostproc.h: ensemble loop migrated to call PostProcessIce::postprocess per member. Soca2Cice variable change no longer used here.

Tests

Four end-to-end tests on the 72×35 grid (one per major code path) plusa pure unit test on the column physics:

test_soca_soca2cice_new - baseline (rebin on)
test_soca_soca2cice_new_freeboard - freeboard enforcement
test_soca_soca2cice_new_seed - new-ice seeding + thermo
test_soca_soca2cice_new_bgfallback - analysis variables resolver
(only aice analysed, hice/hsno fall back to background)
test_soca_icephysics - pure C++ unit test of the column-physics helpers
test_soca_postprocice_vs_soca2cice - Fortran-vs-C++ regression
(Soca2Cice output compared bit-wise to PostProcessIce output;
budget = 2 differing cells at 1e-10 tolerance)
test_soca_ensanpproc - ensemble postproc end-to-end

All pass. The legacy Fortran Soca2Cice path is still available through soca_convertstate.x for the regression test; it can be deprecated in a follow-up PR.

Configuration

A minimal yaml stanza:

postprocess ice:
  ncat: 5
  ice_lev: 7
  sno_lev: 1
  cice restart:
    input:  /path/to/cice_background.res.nc   # also the update-mode template
    output: /path/to/cice_postprocessed.res.nc

Full schema with defaults in src/soca/PostProcess/README.md.

Checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have run the unit tests before creating the PR

… for restarts

Introduces src/soca/IO/soca_io_mod.F90 with soca_io_file_reader / soca_io_file_writer types (init/enqueue/commit/close), and routes all state, geometry, and balance-coefficient I/O through it. PE 0 calls nf90_get_var / nf90_put_var; mpp_broadcast and mpp_gather distribute / collect data on the geometry's pelist. Removes all fms_io_mod usages (register_restart_field, restore_state, save_restart, read_data, file_exist, field_exist, fms_io_init, fms_io_exit) from soca, including the global_soca_geom_counter shim. fms_init / fms_end are retained (subsystem init only, no I/O). Resolves the GDAS Tsnz_h shape mismatch by building nf90_get_var count from the file's actual dim sizes, not the caller's buffer rank. Per-domain state writes (ocn/sfc/ice/wav/bio) and reads now use the new module; the FMS code paths are removed in the same change rather than gated. Refs #1125. Obsoletes PR #1241.

Under FMS, the writer would implicitly add the .nc suffix, so test YAMLs and downstream readers reference 'ocn.<exp>.<typ>.<date>.nc'. The direct-netCDF writer in soca_io_mod is faithful to whatever string it's given, so files were landing without the extension and no consumer could find them.

Six soca_add_test() calls listed test_socs_parameters_diffusion as a dependency (should be test_soca_). Because the named test does not exist, ctest treated the dependency as void and ran the variational tests concurrently with parameters_diffusion under -j>1, causing races on the diffusion calibration outputs.

Drops the SOCA_IO_SERIAL / SOCA_IO_PARALLEL mode bifurcation (and the method= arg + io.method config knob) and rewrites the reader to do per-PE strided nf90_get_var, mirroring FMS mpp_io's domain-decomposed read pattern. Each PE opens the file once via NF90_NOWRITE and reads its own compute-domain tile via nf90_get_var(start, count) -- the same shape as fms_io's READ_RECORD_ for fileset.NE.MPP_SINGLE. Adds a module-level read cache mirroring fms_io's get_file_unit + files_read(i)%var(j) tables: - one nf90_open per (PE, filename) for the whole run - cached (varid, file_ndims, middle_dim sizes) per (file, var) - soca_io_close_all() flushes the cache; wired into soca_geom_end Caller updates: drop method= from reader/writer init() and the soca_io_method_from_config helper from soca_fields_mod / soca_geom_mod / soca_balance_mod. 1-deg 20-mem LETKF read phase (LocalEnsembleDA before solver ctor): baseline (FMS): 3.91 s before this commit (no cache): 13.92 s with this commit: 3.97 s Outputs are bit-identical.

…cache) Adds a second reader_commit implementation -- PE 0 nf90_get_var the global field + mpp_broadcast -- alongside the existing per-PE strided path. Both paths share the file-handle + var-metadata cache. Select via SOCA_IO_READ_MODE=broadcast|strided (default broadcast); the env var is latched on first reader_commit. Why broadcast as default: in DA cycling the page cache is always cold, since each cycle reads files that the previous cycle / model run just wrote. On cold cache the broadcast path (1 sequential read on PE 0, kernel readahead happy) beats strided (8 concurrent strided readers thrash the prefetcher and pay 8x the open-syscall cost). 1-deg 20-mem LETKF, rancor, taskset -c 0-7, cold cache (sample 1): broadcast: 13.5 s read strided: 30.5 s read Warm cache reverses the result (strided 3.9 s vs broadcast 13.3 s) but cycling never sees warm cache. On a parallel filesystem (Lustre/GPFS) or multi-node setup the strided path may win again -- toggle the env var to test.

Drops SOCA_IO_READ_MODE in favor of a yaml block on the geometry config: geometry: io: read mode: broadcast | strided If absent the module-level default (broadcast) is kept. soca_geom_init calls soca_io_read_mode_from_config(f_conf) once at startup; the choice latches for every subsequent reader_commit. Unrecognized values abort so typos surface immediately.

Annotates testinput/letkf.yml's geometry block with the two read-mode options so a contributor reading the canonical letkf example sees both choices without having to hunt for the soca_io_mod docstring.

…t-io

The write path emits FMS-style auto-numbered dim names (xaxis_N / yaxis_N / zaxis_N / Time); the file-header docstring incorrectly claimed xh / yh / Time. Correct the comment -- no code change.

Writer: - enqueue holds pointers to caller buffers instead of allocating and copying (mirrors FMS register_restart_field contract). Compute-slice extraction moves from enqueue to commit, so peak per-writer memory drops from sum(var_bytes) to one (nx_c x ny_c) tile. - commit uses the 3D mpp_gather overload for 3D vars: one collective per var instead of nlevels, and no per-level gbuf3d(:,:,k) = gbuf2d memcpy on root. - Reuse gbuf3d / tile3 across 3D vars when nlevels matches. Reader: - commit_reader_strided hoists tile2 out of the per-var loop and reuses tile3 / tile4 when trailing dims match. - Dead tile3/tile4 zero-inits removed (read_var_strided fully fills the buffer). read_var_strided: fixed-size stack arrays for st / ct instead of per-call heap allocation. put_axis_coord_data: reuse idxbuf across axes. Cleanup: - Remove dead 'use mpi' and 'use fckit_configuration_module'. - Remove dead cartesian_axis parameter on writer_enqueue_1d (it was stored but never written to the netcdf output). - Drop unused nprocs out-param of mpi_pelist. - soca_io_close_all: use ncc on nf90_close instead of silently discarding the status. - commit_reader_scatter: matching dead-init cleanup; comment updated to flag it as pending parallel-ensemble I/O exercise. - Trim verbose / redundant comments.

Drop dangling references to the removed soca_io_read_mode_from_config / 'io.read mode' YAML key in soca_geom_mod and the letkf test input. The selector was already gone from soca_io_mod; without these the PR fails to compile.

- soca_io_mod: allocate gbuf2d / gbuf3d as 1x1 dummies on non-root so the actuals passed to mpp_gather are always allocated (assumed-shape dummy requires it). Pattern matches commit_reader_scatter. - soca_io_mod: refresh stale 'Data is copied in' comment to reflect the pointer-based enqueue semantics (caller buffer kept alive/unmutated through commit; actual must satisfy TARGET association rules). - soca_geom_mod: TARGET on the 'self' dummy in soca_geom_init, soca_geom_init_fieldset, and soca_geom_write so the allocatable components (self%lonh, self%lat, ..., self%mask2d*) are valid pointer targets for the soca_io reader/writer enqueue. TARGET on the local fieldData / fieldDataVars too. - soca_fields_mod: TARGET on h_common and on the local vars(:) wrapper array so vars(n)%data sections used in enqueue are valid pointer targets. - soca_balance_mod: TARGET on local kct for the same reason.

…e checks Three related fixes in the direct-netCDF reader path: 1. read_var_strided previously hardcoded "last Fortran dim is trailing time" and set ct(file_ndims)=1 unconditionally. For a file with spatial-only layout (Temp(z,y,x) with no leading Time), this read only level 1 of z into a multi-level destination, leaving the rest uninitialized -- showing up as garbage scale values in the vertical diffusion calibration. Now discriminate file_ndims == dst_rank (no time, fill every dim from the file) vs file_ndims > dst_rank (trailing time + middle squeeze, e.g. CICE Tsnz_h's nksnow=1). A total-element-count check catches silent partial-fills and dim-size mismatches (e.g. file z=75 vs destination z=25). 2. Drop the read_cache / cached_open / soca_io_close_all machinery. Each reader_commit nf90_opens, reads all enqueued vars (with inline inq_varid + inquire_variable + inquire_dimension; microseconds), and nf90_closes. Holding NetCDF4 handles open across commits was bloating LETKF per-task memory by ~MB-to-GB scale (HDF5 metadata + chunk caches per open file). Restores per-task max to match develop. 3. Add check_buf_1d / check_buf_2d helpers called from every reader/writer enqueue. Catches unallocated and wrong-sized caller buffers at the enqueue site instead of silently storing the pointer for a later get_var/put_var to mishandle.

# Conflicts: # src/soca/Fields/soca_fields_mod.F90

Small, no-behavior-change cleanups before the bigger refactors: - Q5: delete unused 5-arg adjustThicknessCategories back-compat overload. Update the four TestIcePhysics calls to the 9-arg form; rewrite the _InfeasibleTarget test since legacy semantics no longer apply. - Q9: remove rho ice/snow/ocean from FreeboardParameters. They were always CICE's defaults; making them configurable would let users write a restart inconsistent with CICE's physics. Use icephysics::Constants directly. Strip the now-unused rho lines from 3 test yamls. - Q10: rewrite seed new ice docstring to match the 2026-05-20 change (qice from Tfrz - Tice_seed_offset, only Tsfcn from donor). - Q12: move applyThermoStage and CatRecord to the private section. - Q13: replace the 12-line per-cat view loop with 6 ppiCatViews calls. - Q16: delete snowEnthalpyToTsfc (only used by the removed Tsfcn-from-qsno path), the unused 'sum' helper in IcePhysics.cc, and the (void)sum warning suppression. Replace round-trip unit test with a known-value test. - Q7/Q8: docstring fixes for 'min thickness gap' (numerical robustness knob) and 'max snow thickness' (explicitly per-ice-area, m). All 10 postprocice/soca2cice/icephysics/ensanpproc tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The rebin pass is what makes per-cat thicknesses sit inside their bins; running without it leaves the restart in a state CICE will quickly rebin on its own, and is only useful as a legacy diagnostic. Production users get correct behavior automatically; the legacy no-rebin path stays available via 'itd.rebin: false' in yaml. soca2cice_new.yml and soca2cice_new_bgfallback.yml now run with rebin on (no yaml change needed; they don't set itd: explicitly). Testrefs are empty so no refresh needed. All 10 postprocice tests pass, including the Fortran-vs-C++ regression with its 2-cell tolerance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Six soca2cice_new* yamls collapsed to four: - soca2cice_new (old non-rebin baseline) deleted; redundant now that rebin is the default. soca2cice_new_rebin renamed to take its slot with the itd: block dropped (all defaults match). - soca2cice_new_thermo deleted; its scope (thermo: {update snow thermo, reset ponds}) is a strict subset of soca2cice_new_seed. Kept: soca2cice_new (new production baseline, rebin on), _freeboard (only freeboard coverage), _seed (full Stage C donor-halo seed), _bgfallback (analysis variables resolver). Also added soca2cice_new_freeboard.yml/_seed.yml + their .test files to the explicit soca_test_input/soca_test_ref lists so they survive a cmake reconfigure (previously these were only registered by soca_add_test, which doesn't add them to the configure-time symlink sweep). All 8 remaining postprocice tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- 'min aice' was an input clamp on analysis aice, defaulted to 0.0 (a no-op), and never set by any caller. Its role is fully covered by 'min aice output' (which clamps the *output* and is set to 1e-4 by default to drop near-zero analysis aice cells). The input clamp now just enforces nonneg; remove the parameter. - 'min vice' renamed to 'min cat ice volume' to disambiguate from 'min new ice thickness' (per-cell new-ice fallback). Docstring rewritten to explain it's the per-cat volume floor that drives the mass-conserving sweep after Stage A/B. Default unchanged at 1e-5. The legacy Fortran Soca2Cice path keeps its own 'min aice' / 'min vice' keys; those are separate parameters. All 8 postprocice tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The only consumer of ocean T/S inside postProcess was the SST-on-ice2noice adjustment: when the analysis removes ice from a previously-iced cell, the surface ocean was optionally warmed toward Tfrz so the post-DA SST was consistent with no ice. The salinity field was copied in but never read. Both are removed: - SstUpdateParameters block and its 'sst update:' yaml key. - copyOceanFields helper (static); the ocean T/S copy at the top of postProcess; the new_tocn writable view. - The Case-1 ICE2NOICE branch's SST warming code. Rationale: surface-ocean adjustment on ice removal is a DA-increment concern, not a CICE-restart concern. Folding it back into the MOM6 IAU increment (gdasapp/utils/soca/gdas_incr_handler.h, removed in a parallel commit) was working around the wrong layer doing the work. Future revisions, if needed, should adjust SST upstream where the ocean increment is generated. readRestart's State now carries only sea-ice fields; PostProcessIce neither reads nor writes ocean variables (ocean T/S on the analysis is unused). All 8 postprocice tests pass, including the Fortran regression test with its 1e-10 tolerance / 2-cell budget. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The generic soca_fields_write_rst no longer scans the config for a 'cice template' key. Instead: - new Fortran subroutine soca_fields_write_cice that handles the config -> soca_fields_write_cice_rst dispatch; - new C entry soca_state_write_cice_f90 / soca::State::writeCice; - PostProcessIce::writeRestart now calls pproc.writeCice(cfg) instead of pproc.write(cfg) with the magic 'cice template' key. write_rst goes back to being a pure domain-by-domain restart writer that knows nothing about CICE update mode. Callers that want CICE update mode go through writeCice explicitly. All 8 postprocice tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The three call sites (soca's Postproc.h, AnalysisPostproc.h, and gdasapp's gdas_incr_handler.h) all did the same triple: State restart = ppIce.readRestart(geom, analysis.validTime()); State pproc(geom, restart); ppIce.postProcess(pproc, restart, analysis); ppIce.writeRestart(pproc, analysis.validTime()); Replaced with a single public method: State PostProcessIce::postprocess(const State & analysis) const; which opens the CICE restart, applies Stage A/B/C in update mode, writes the postprocessed restart, and returns an aggregate-ice State on analysis's geometry carrying {sea_ice_area_fraction, sea_ice_thickness, sea_ice_snow_thickness} matching what was written. readRestart, writeRestart, ciceRestartVariables, and the case-dispatch loop (now runPostprocess) are private. The aggregate-ice return value is what the gdasapp pipeline / scripted callers can consume without having to know the per-cat CICE schema. Postproc.h and AnalysisPostproc.h migrated. The gdasapp side migrates in a parallel commit. All 8 postprocice tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

soca_coding_norms (include-what-you-use linter) flagged this on line 57. The header has been using std::make_unique since the file was introduced but pulled <memory> transitively; the linter wants it explicit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Self-review pass on the PR-as-a-single-change: - Remove unused includes from PostProcessIce.cc (<limits>, <numeric>, atlas/mesh/Mesh.h, atlas/mesh/Connectivity.h, atlas/mesh/actions/BuildCellCentres.h). - Drop project-internal labels from comments and docstrings: - 'Stage A / Stage B / Stage C+' inside the per-cell loop → use what the code actually does ('Per-cell pass', 'ITD rebin', 'Snow distribution', 'Freeboard enforcement'). - 'Phases A+B+C' in the donor-halo gather header → 'Sparse halo exchange'. - 'user decision 2026-05-11', '+2.6 K warm bias on the production grid', 'With the shuffle path removed', 'rebin shuffle', 'dd' shorthand, external Python script filenames ('mc6util.adjust_*', 'insert_iconc_ithkn_cice6_restart.py'). - Stage references in PostProcessIce.h ('Parameters for the Stage C thermo / pond pass', 'after Stages A/B', etc). - Fix the log message at the end of runPostprocess: was printing (the background, unchanged) instead of (the postprocessed state). Renamed all 3 info-log lines to use the consistent 'PostProcessIce:' prefix. - Group all parameter pickups at one place rather than 'additional knobs picked up later in the function'; move seedK with the rest. - Rewrite stale parameter docstrings ('Parameters for adding soca increment to CICE restart files' → 'Top-level parameters'; 'Stage C thermo' → 'Per-layer thermo / pond reset', etc.) and drop the 'matches the Python reference scripts' / 'matches dd' qualifiers in favor of the actual physical rationale. - Update Tice_seed_offset / Tsno_clip_max comments in IcePhysics.h to state what they are, not what they 'match' in an external script. - Add src/soca/PostProcess/README.md documenting the workflow, per-cell pass, thermo pass, new-ice seeding, full yaml configuration schema with defaults, and the implementation notes (per-cat field naming, KDTree halo, IcePhysics helpers). Style matches the existing Soca2Cice README. All 8 postprocice tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

B-list (questions on the comprehensive PR diff): - B4: validate `itd.category bounds` length (ncat+1) at PostProcessIce construction, not only when rebin runs. Default is the CICE5 ncat=5 layout; ncat != 5 without bounds override now fails fast regardless of rebin/freeboard. - B5: drop the redundant `itd:` block from ensanpproc.yml. With rebin defaulted on and the only knobs in that block being the CICE5 defaults, the block was pure noise. - B6: rewrite the AnalysisPostproc.h comment around seekAndReplace to state where %mem% actually fires (cice restart: input/output paths). - B1: move tools/postproc_compare/ out of the soca submodule. Per the README it's a "study tool, not a CI test", and once Soca2Cice is deprecated this becomes a one-shot deprecation artifact. The directory lives at jedi-bundle/tools/ now (untracked, alongside the user's other local helper scripts). D-list (cleanup found in the self-review): - D1: drop the double `pproc = restart;` in runPostprocess. `postprocess` already did `State pproc(geom_, restart);` (the copy-constructor). - D2: stop adding aggregate ice diagnostic atlas Fields onto pproc.fieldSet(). The aggregates were quiet violations of the State invariant (extras bypassing soca::Fields's tracked-fields list) and only existed to be read back by the postprocess() wrapper. Instead, runPostprocess now constructs the aggregate-ice result State on the analysis geometry and writes the aggregates directly into its fieldset; it returns the State. pproc keeps the per-cat schema the CICE-restart writer expects. - D3: drop the empty `print(std::ostream &)` override and the util::Printable inheritance. PostProcessIce is never streamed anywhere; the inheritance was dead weight. - D4: bg_aice atlas Field replaced with std::vector<double> for consistency with a_hice_vec / a_hsno_vec. Same memory cost, no pseudo-field on the working FieldSet. All 8 postprocice tests + coding norms pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The aggregate diagnostics are no longer added onto pproc and read back; runPostprocess builds the result State directly. Reword the workflow description accordingly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

PostProcessIce uses the CICE restart input both as the update-mode template and (configurably) as the output path. When users configure 'cice restart: { input: foo.nc, output: foo.nc }' for in-place update, the byte-copy is wasted work: the file already contains its own template. writer_commit_update now checks for equal template/filename and skips the copy when they match, opening the existing file directly NF90_WRITE. Non-equal paths still go through the byte-copy as before, preserving the safe "keep an unmodified backup" semantics for users who want them. Path comparison is a literal string equality; cosmetic differences like './foo.nc' vs 'foo.nc' would still trigger the (harmless) self-copy. Good enough for the common gdasapp case where the two yaml fields hold the exact same string. Verified manually: running soca_postproc.x with equal input/output paths mutates the bg restart in place (different md5 after the run); all 9 existing postprocice tests still pass (they configure distinct paths). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shlyaeva · 2026-05-27T21:03:25Z

PostProcessIce: production-grid validation across 7 dates

Side-by-side comparison of the new C++/atlas PostProcessIce path against the legacy Fortran Soca2Cice path and @DmitryDukhovskoy's Python prepare_cice6 scripts (dd here), on retro + realtime GFSv17 data (0.25deg grid).

Dates (one per season + multi-year coverage):
20240715, 20250115, 20250330, 20250715, 20250920, 20251221, 20260322.

For each cycle we hold the same DA analysis (from GFSv17 with artificially increased ice thickness and snow depth to test insertion of those variables) and the same CICE background fixed, and compare the resulting CICE restart written by each of the three pipelines.

Aggregate cell-level RMSE vs analysis

Mask: cells where analysis.aice > 1e-4. Numbers are RMSE averaged over the 7 dates; units are ice concentration (0-1) for aice and per-cell volume (m³/m²-cell) for v_ice / v_sno. Lower is better.

field	new	legacy	dd	background
aice	0.00331	0.00507	0.00331	0.0426
v_ice	0.0885	0.258	3.91*	0.360
v_sno	4.4e-5	0.0741	0.00209	0.0887

* dd's high v_ice RMSE is a known unit-convention mismatch; should not be considered a regression in this PR.

Per-category ICE2ICE perturbation vs background

Mask: per-cat slots where both bg.aicen > 1e-4 and new.aicen > 1e-4. This measures how much the postprocessing perturbs the background per-cat shape on cells that have ice in both bg and analysis - lower is better (less perturbation = more CICE-stable restart). Averaged over the 7 dates.

field	new	legacy	dd
aicen	0.0184	0.190	0.0185
vicen	0.094	0.248	0.113
Tsfcn (°C)	0.32	8.44	1.85
sice001	0	2.90	0.74

aicen: new ties dd, both beat legacy by ~10×.
vicen: new is the lowest perturbation.
Tsfcn (per-cat surface temperature): new's perturbation is bounded by the −1 °C clamp (0.13 - 0.64 °C range across dates); legacy runs hot at 4.5 - 11.5 °C; dd is in between at 1.2 - 2.6 °C.
sice001: new preserves bg ice salinity exactly (per-cat alpha-rescale doesn't touch it); both legacy and dd perturb it by O(1) ppt.

Plots

Three production-grid comparisons for 20240715 (arctic summer / antarctic winter). Each figure has 3 rows (new / legacy / dd) and 4-5 columns (background, analysis, updated, one or two updated − analysis diff
panels at different colour ranges):

Ice thickness in the Arctic:

Clearest contrast: new's diff panel is white almost everywhere (close-to-analysis ice thickness); legacy has a large blue interior blob (under-thick ice across the entire pack); dd has a dense red rim that should be ignored (not a regression).

Snow thickness in the Arctic:

new and dd diffs essentially zero (snow tracks analysis exactly); legacy shows a large interior blue patch (snow under-prediction).

Ice thickness in the Antarctic:

Same pattern in the SH: new clean diff, legacy interior bias, dd rim (ignore).

All 7 dates exhibit the same qualitative behaviour - the new path is consistently close to the analysis on ice thickness and snow (significantly better than legacy).

Bottom line

No degradation compared to Dmitry's scripts, significant improvement for ice thickness and snow depth compared to the legacy soca2cice.

shlyaeva · 2026-05-27T21:08:53Z

I'm going to run a few cycles with the ensembles using this postprocessing and report back here. In the meantime, this is mostly ready for review, I don't anticipate major changes. If it looks painful to review, just know that I've been working on this on and off for half a year, and I'm with you 😆 .

CICE6's linear_itd aborts when per-cat thickness `hi = vicen/aicen` falls outside `[hicat[k], hicat[k+1]]`. Diagnostic scan of the existing production-grid restarts (datatoanalyze/, 7 dates) found ~5% of ice cells in every restart had at least one populated cat with out-of-bin hi -- ~5,000-16,000 cells per date. CICE6 was crashing on these. Two bugs at play: 1. **Default `category bounds` was CICE5 layout** with bounded top (`{0, 0.6445, 1.391, 2.470, 4.567, 9.334}`) whereas CICE6 GFSv17 uses `{0, 0.64, 1.39, 2.47, 4.57, 1000}` (open top). Silent mismatch produced restarts CICE refused to read. Make `category bounds` a RequiredParameter -- users must spell out the layout matching their CICE version. 2. **The standalone min-vice cleanup block** in PostProcessIce.cc (now removed) redistributed mass from dropped cats to surviving cats proportional to surviving aicen, without respect to bins. This shifted survivors' per-cat hi outside their bins, producing the CICE6 abort cases. The dd Python reference (mod_cice6_utils.py adjust_thkncats_aice) doesn't have a separate min-vice cleanup -- its scipy.optimize.minimize call uses the bin edges as variable bounds, so survivors are guaranteed in-bin. We rely on the rebin's existing `ainMin = 1e-8` clip + Step 7 renormalisation (which preserves per-cat hi), and add a final per-cat bin clip as defence in depth: const double hClipped = std::min(std::max(h, hLo), hHi); if (hClipped != h) vicen[k] = aicen[k] * hClipped; Aicen is left alone (Σaicen preserved); a small amount of mass is lost to the clip, much smaller than the test failures the bug produced. Drops the now-unused `min cat ice volume` yaml parameter. Test fallout: the 4 soca2cice_new* testrefs shift slightly (min-vice cleanup is gone). The freeboard test shifts hardest (hice 0.4153 -> 0.4044, because the cleanup was undoing some of freeboard's ice-volume growth). Test yamls updated to set `category bounds` explicitly. 9/9 postprocice tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Log::info() writes only from task 0 by oops convention, so all the counter / max-magnitude diagnostics in runPostprocess were silently reporting only rank 0's piece of the grid -- misleading on multi-task runs. allReduceInPlace each counter (sum) and each magnitude (max) on geom_.getComm() before the log lines, so what gets printed is the global total / global max. Covers: - rebin_visited, rebin_aicen_mutations (sum) - rebin_max_delta_aicen_all (max) - rebin_failures, freeboard_failures (sum) - bin_clip_slots (sum) - bin_clip_max_abs_dh (max) - seedNewIce fallbacks (sum, separately because seedNewIce runs after the main loop) Also adds the bin-clip diagnostic itself (was uncounted): per-cat slots clipped + max |dh| in metres. Only printed when at least one slot was clipped. 9/9 postprocice tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Log::warning fans out to every MPI task, so the per-rank warnings were printing N identical copies of the global-summed counts. Switch the ITD-rebin-failure, freeboard-failure and noice->ice Tfrz-fallback messages to Log::info (rank-0-only). These are diagnostic counters about non-fatal events the code already handles, so info is the right level. Mention the bin clip in the rebin-failure message so the log explains how out-of-bin cells get repaired. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Replace the 'min-vice cleanup' step in the per-cell pass with the new 'per-cat bin clip' step (the cleanup is gone; the clip is the guarantee that the restart is CICE-readable). - Drop the 'min cat ice volume' configuration entry (parameter no longer exists). - Mark 'itd: category bounds' as required in both the schema block's inline comment and the closing 'required fields' bullet. Switch the example value to the CICE6 GFSv17 layout ([0, 0.64, 1.39, 2.47, 4.57, 1000]) and document the CICE5 alternative; explain that the previous silent default was a footgun. - Tidy two surrounding references to 'min-vice cleanup' that pointed at the dropped step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

shlyaeva and others added 30 commits April 27, 2026 10:47

First version: updates aice, vice, vsno almost like in soca2cice

58007bc

soca2cice updates: more code in postprocess + helpers; new C++ output…

db385a0

… for restarts

More ice physics; add CICE restart I/O

536bdde

add tools for postprocessing

0713be2

Comparison scripts

c878799

update scripts

e80749a

better ice thickness and snow depth handling (hopefully)

374955b

fix tsfcn issue + some cleanup of unused yaml parameters

6bd42ba

soca letkf ctest: document soca_io read mode block

4652de9

Annotates testinput/letkf.yml's geometry block with the two read-mode options so a contributor reading the canonical letkf example sees both choices without having to hunt for the soca_io_mod docstring.

Merge remote-tracking branch 'origin/develop' into feature/soca-direc…

8c16936

…t-io

soca_io_mod: fix stale docstring about output dim names

1449bc6

The write path emits FMS-style auto-numbered dim names (xaxis_N / yaxis_N / zaxis_N / Time); the file-header docstring incorrectly claimed xh / yh / Time. Correct the comment -- no code change.

some additional fixes

514bf0d

claude was BAD and forgot to commit this

4f33324

Drop dangling references to the removed soca_io_read_mode_from_config / 'io.read mode' YAML key in soca_geom_mod and the letkf test input. The selector was already gone from soca_io_mod; without these the PR fails to compile.

support ice volume and snow volume or no thickness/volume variables

6a3a3ee

use new soca2cice in ens postproc

1d03ae8

Merge branch 'develop' into feature/new_soca2cice

db37837

Merge branch 'feature/soca-direct-io' into feature/new_soca2cice

aaf0859

# Conflicts: # src/soca/Fields/soca_fields_mod.F90

Update I/O

65aabb8

Merge branch 'develop' into feature/new_soca2cice

817888e

Cleanup

c46220e

fix the moreice test

37f17d4

shlyaeva and others added 16 commits May 26, 2026 15:41

clean up and simplification

08f5fef

temporary for testing

61f9bc4

README: align with the runPostprocess-returns-State refactor

f43e500

The aggregate diagnostics are no longer added onto pproc and read back; runPostprocess builds the result State directly. Reword the workflow description accordingly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add test prints and update references

2c21ec6

Merge branch 'develop' into feature/new_soca2cice

7bca476

shlyaeva requested review from Dooruk, guillaumevernieres and travissluka May 27, 2026 21:08

shlyaeva requested a review from DmitryDukhovskoy May 27, 2026 21:09

shlyaeva and others added 6 commits May 27, 2026 18:52

Better prints in ensemble analysis postprocessing

0665c9e

correct obsolete comment

2368857

shlyaeva mentioned this pull request May 28, 2026

Better sea ice thickness and snow depth insertion into CICE files NOAA-EMC/GDASApp#2152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored soca -> cice#1246

Refactored soca -> cice#1246
shlyaeva wants to merge 52 commits into
developfrom
feature/new_soca2cice

shlyaeva commented May 27, 2026 •

edited

Loading

Uh oh!

shlyaeva commented May 27, 2026

Uh oh!

shlyaeva commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shlyaeva commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What's new

Tests

Configuration

Checklist

Uh oh!

shlyaeva commented May 27, 2026

PostProcessIce: production-grid validation across 7 dates

Aggregate cell-level RMSE vs analysis

Per-category ICE2ICE perturbation vs background

Plots

Bottom line

Uh oh!

shlyaeva commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shlyaeva commented May 27, 2026 •

edited

Loading