Skip to content

refactor: behavior-preserving cleanup sweep across the engine#87

Open
ccomb wants to merge 45 commits into
mainfrom
advanced_haskell
Open

refactor: behavior-preserving cleanup sweep across the engine#87
ccomb wants to merge 45 commits into
mainfrom
advanced_haskell

Conversation

@ccomb
Copy link
Copy Markdown
Owner

@ccomb ccomb commented May 25, 2026

Summary

A broad, behavior-preserving refactoring sweep across the engine. The recurring theme is replacing hand-rolled boilerplate and ad-hoc plumbing with the standard typeclass abstractions that already fit each problem — so the structure becomes explicit and the duplication disappears. No feature work; net ~290 fewer lines across 27 files.

The main strands:

  • JSON encoding via a Stripped DerivingVia carrier — one newtype replaces ~100 near-identical ToJSON/FromJSON/ToSchema instances in API/Types, Types, and the OpenApi schema modules. Each type now self-encodes by deriving, co-located with its declaration.
  • AppM = ReaderT AppEnv Handler — Servant handlers read their environment instead of threading it through arguments; hoistServer collapses the wiring at the edge. Errors route uniformly through throwServiceError.
  • Monoids instead of bespoke merge functionsUnlinkedSummary, MappingStats, CrossDBLinkingStats, and TreeStats become Monoids, so accumulation collapses to mempty/<>/foldMap and the hand-written empty/merge pairs are deleted.
  • Validation applicative for accumulating aggregate-endpoint query-param errors instead of failing on the first.
  • Parser cleanups — EcoSpold 1/2 SAX folds deduplicated into shared handlers, the SimaPro method parser modeled as a single Stage ADT, and SimaPro process/param plumbing slimmed.
  • Service and MCP deduplication — graph builders extracted into named helpers, exchange handling unified via FlowKind dispatch, MCP tool handlers flattened onto a shared ExceptT idiom.
  • Module splitDatabase.MatrixBuild extracted out of Database; CLI optparse-applicative builders factored.

Four small bug fixes surfaced along the way and are included: keep the JSON field name when prefix-stripping would empty it, keep the root graph node below the cutoff, emit SimaPro categories that have no rows, and align the OpenAPI schema for MissingSupplier/DependencyChoice/DependencyStatus.

Verification

  • cabal build all clean — no new warnings vs the pre-branch baseline.
  • cabal test green.
  • dump-openapi matches the pre-branch baseline (modulo the intentional schema-alignment fix above).

ccomb added 22 commits May 26, 2026 00:29
Introduce `newtype Stripped a` in API.JsonOptions exposing ToJSON,
FromJSON, and ToSchema instances that delegate to the existing
strippedToJSON/strippedToEncoding/strippedParseJSON/strippedSchemaOptions
helpers. The instance bodies bypass the user-facing typeclass on `a` and
call the Generic helpers directly, so a user-side
`deriving via (Stripped Foo) instance ToJSON Foo` does not recurse.

This is pure addition — no existing instance is converted yet. The next
two commits collapse ~180 hand-rolled instance blocks in src/API/Types.hs,
src/Types.hs, and src/API/OpenApi.hs onto this carrier.

Categorically: Stripped a is a zero-cost iso (Coercible) of a, so the
Generic Rep is shared; the typeclass dictionary derived for Stripped a
transfers along the iso to a itself.
Convert hand-rolled `instance ToJSON X where toJSON = strippedToJSON; ...`
and `instance FromJSON X where parseJSON = strippedParseJSON` blocks for
70 record types in API/Types.hs to attached deriving clauses:

    data Foo = Foo {...}
        deriving (Generic)
        deriving (ToJSON, FromJSON) via (Stripped Foo)

The wire format is unchanged (openapi.json byte-identical against the
pre-branch baseline; full hspec suite green at 1052/1052).

Kept manual: ApiFlow (custom tagged-union shape), NodeType / EdgeType /
FlowRole (default generic sum encoding), PerturbedEntry (custom
Either-flatten).

ToSchema migration deliberately deferred: instances live as orphans in
API/OpenApi.hs and depend on types (Exchange, ApiFlow) whose schemas
are themselves defined there, creating a compilation-order cycle if we
try to derive in API/Types.hs. Will revisit as its own commit.
Convert the manual `instance ToJSON / FromJSON … where toJSON =
genericToJSON stripLowerPrefix; …` blocks for Pedigree, Exchange,
Compartment, TechnosphereFlow, BiosphereFlow, and WasteFlow into
attached `deriving (ToJSON, FromJSON) via (Stripped X)` clauses on
each data declaration.

TechRole and BioDirection keep their empty `instance ToJSON …`
declarations (default Generic encoding for nullary-constructor sum
types — the wire form is the constructor name as a JSON string).

openapi.json byte-identical against the pre-branch baseline;
hspec suite green at 1052/1052.
…rrors

Add Data.Validation — an Applicative-only type that accumulates errors
via the Semigroup on @e@. The canonical example, in Milewski's /Category
Theory for Programmers/, of an Applicative that is not a Monad: a lawful
Monad instance would have to short-circuit on the first Failure to keep
>>= associative w.r.t. the underlying Either behaviour, which would
erase exactly the accumulation that motivates the type. So we stop at
Applicative.

Pure addition — no caller migrated in this commit. The next commit
replaces the nested case/throwError ladder in
src/API/Routes.hs:1296-1318 (Aggregate query-param validation) with an
Applicative chain over Validation (NonEmpty Text), so a request with
both an invalid `scope` AND an invalid `aggregate` parameter reports
both errors instead of just the first.
Replace the nested case/throwError ladder in the aggregate endpoint's
parameter validation (Routes.hs:1296-1318) with an Applicative chain
over Validation (NonEmpty Text). A request with multiple bad query
params (e.g. invalid `scope` AND invalid `aggregate`) now reports
every error at once, joined by "; ", instead of just the first.

Categorically: Validation accumulates over the Semigroup of its error
type. The cross-check between `scope` and `filter_exchange_type` stays
in Handler because Validation is not a Monad — by design, since a
lawful Monad would have to short-circuit and erase the accumulation.

openapi.json byte-identical; hspec suite green at 1052/1052.
Replace the hand-rolled emptyUnlinkedSummary / mergeUnlinkedSummaries
pair with a Semigroup + Monoid instance on UnlinkedSummary. Call sites
that used `foldr mergeUnlinkedSummaries emptyUnlinkedSummary summaries`
collapse to `mconcat summaries`; the scattered `emptyUnlinkedSummary`
constructor calls in fixExchangeLink and fixExchangeLinkByName become
`mempty`.

Categorically: UnlinkedSummary is the product of four monoids — a
Map T.Text [UnlinkedExchange] under unionWith (++), plus three Int
counters under addition. The instance is hand-written rather than
derived via Generically because bare Int has no canonical Monoid
(Sum vs Product is ambiguous) and wrapping the fields as Sum Int
would have leaked through every constructor and accessor.

Tests in test/LoaderSpec.hs updated to exercise the new <>/mempty
surface.
Add App.Env exposing:
  - AppEnv: the read-only application environment (DatabaseManager,
    max tree depth, optional password, hosting config, classification
    presets) currently threaded explicitly through five lcaServer
    parameters and into every handler.
  - AppM = ReaderT AppEnv Handler, with GeneralizedNewtypeDeriving on
    Functor / Applicative / Monad / MonadIO / MonadReader / MonadError.
  - runApp :: AppEnv -> AppM a -> Handler a, the natural transformation
    AppM ~> Handler that Servant's hoistServer will use to lift a
    ServerT api AppM into ServerT api Handler at the API boundary.
  - Five "Has*" capability classes (HasDatabaseManager, HasMaxTreeDepth,
    HasPassword, HasHostingConfig, HasClassificationPresets) — narrow
    typeclass projections of the environment so handlers can declare
    only the slice they consume.

Pure addition. No caller migrated in this commit. The next commit
threads AppM through lcaServer and the ~30 handler bodies, removing
the requireDatabaseByName + getMergedUnitConfig boilerplate that
currently begins every handler.

Categorically: AppM lives in the Kleisli category of Reader AppEnv
lifted into Handler's Kleisli category. The Has-pattern is the
projection morphism from the environment object onto each component.
Convert lcaServer and the underlying handler / helper layer in
src/API/Routes.hs and src/API/DatabaseHandlers.hs to live in the AppM
capability monad (ReaderT AppEnv Handler) introduced in the prior
commit. The natural transformation `runApp env :: forall a. AppM a ->
Handler a` is applied via Servant.hoistServer at the API boundary, so
the LCAAPI route type itself is unchanged and openapi.json stays
byte-identical against the pre-branch baseline.

What changes architecturally:
  - lcaServer :: DatabaseManager -> Int -> Maybe String
      -> Maybe Config.HostingConfig -> [Config.ClassificationPreset]
      -> Server LCAAPI
    becomes lcaServer :: AppEnv -> Server LCAAPI, with the five env
    fields exposed via let-bindings inside the where block for the
    handlers' closure-style access (LOC-equivalent to inserting an
    `asks` at every call site, and easier to read).
  - All ~30 fanout calls of DBHandlers.X dbManager collapse to
    DBHandlers.X — DBHandlers is now AppM-typed end to end.
  - The shared cross-DB helpers (requireDatabaseByName,
    requireFullyLinked, inventoryWithDeps, solutionWithDeps,
    inventoriesWithDeps, solutionsWithDeps, activityLCIABatchH,
    batchImpactsH) drop their DatabaseManager-as-first-arg parameter
    and read it from the env via `asks getDatabaseManager`.
  - app/Main.hs packs the AppEnv once with mkAppEnv before passing it
    to lcaServer.
  - src/API/BatchImpacts.hs (the non-Servant MCP-side wrapper) builds
    a minimal AppEnv on the fly and discharges the AppM with runApp +
    Servant.runHandler.

What does *not* change:
  - The LCAAPI route surface and JSON contract (openapi.json
    byte-identical).
  - The full hspec suite (1052/1052) is green.
  - The handler logic and error mapping.

This commit is roughly LOC-neutral on its own (the `asks` insertions
in helpers offset the arg removals at call sites), but it is the
foundation that the next commit (optparse dedup) and any future
capability-style refactor (MCP Freer, finer Has-classes per endpoint,
applicative request validation) build on.

Categorically: lcaServer now lives in the Kleisli category of `Reader
AppEnv` lifted into Handler's Kleisli category. hoistServer is the
ServerT-functor's action on the natural transformation AppM ~> Handler.
Six tiny helpers (strOpt / optStrOpt / textOpt / optTextOpt / intOpt /
optIntOpt / strArg / textArg) collapse the four shapes that dominated
src/CLI/Parser.hs — Text option, Int option, positional arg, with or
without short alias — into one line each. ~172 LOC removed, `--help`
output unchanged, openapi.json byte-identical, hspec 1052/1052 green.

Categorically: optparse-applicative's Parser is the textbook free
Applicative over a primitive 'parser action'. Composition with <*>
keeps the static structure exposed, which is precisely why a Parser
can produce both the input parsing and the `--help` text from one
definition. Stopping at Applicative (instead of going to Monad) is
the structural reason these helpers can exist at all — a Monadic
parser would have to inspect run-time values, defeating the static
introspection.
Cleanup leftover after the AppM migration in commit d0a9b1f. The file
no longer references Servant's Handler directly — every former
`Handler X` signature is now `AppM X`, so the `Handler` symbol is
dead in the import list. Restores the build to the pre-branch warning
count (2: pre-existing MCP unused-import / unused-match).
Resolves the orphan / compilation-cycle deferral from C2. Each domain
data type now declares its own ToSchema next to its ToJSON / FromJSON
instance, derived via the same Stripped newtype that already carries
the JSON dictionaries.

Effect on src/API/OpenApi.hs: 293 → 144 lines. The module loses its
role as the orphan ToSchema collector and shrinks to a few special
cases — Value (untyped JSON), LocationKind (lowercase wire codes that
the generic schema would not surface), BinaryContent (OctetStream
format), plus the small handful of Database.Manager domain types
(MissingSupplier, DependencyStatus, DependencyChoice, LocationFallback,
LocationUnresolved, DatabaseSetupInfo) — and the enrichWithResources
post-processor that was always its real reason to exist.

Three categories handled:
  1. Records that already had Stripped JSON deriving — added ToSchema
     to the same `deriving (...) via (Stripped X)` clause.
  2. Sum types with default Generic encoding (TechRole, BioDirection
     in Types.hs; NodeType, EdgeType, FlowRole in API/Types.hs) —
     added `deriving anyclass (ToSchema)`.
  3. Polymorphic SearchResults a — standalone `deriving via (Stripped
     (SearchResults a)) instance ToSchema a => ToSchema (SearchResults
     a)` (the polymorphic context requires standalone).

Two custom ToSchema instances (ApiFlow with its discriminated `kind`
union; PerturbedEntry with its flattened Either) moved from OpenApi.hs
to API/Types.hs alongside their data declarations — same logic,
different home. The lens / openapi3 imports came along.

Net: −53 LOC across three files. openapi.json byte-identical; hspec
1052/1052 green.

Categorically: this is the same isomorphism transport (Coercible
witness) as the JSON commits, just for a third typeclass. With
ToJSON / FromJSON / ToSchema all derived through one newtype, the
"this type's wire form" definition lives in one place — what
Milewski calls /faithful representation/: the schema and the
serializer cannot drift, because they share their Generic Rep.
Drop the explicit DatabaseManager parameter from loadCollection and
crossDBSolutionFor; both now `asks getDatabaseManager` inside their
AppM body. Call sites lose the corresponding closure-captured arg.

Net delta is small (+2 LOC) — the body gains the `asks` line but each
call site loses one word. The win is architectural: every helper that
needs the database manager now declares it as a capability requirement
on the monad, not as a positional parameter on every signature, which
is the Has-pattern (capability-class projection) the prior commits
laid the foundation for.

The remaining helpers that still take DatabaseManager (prepMethodCtx,
buildPerDbSetTables, batchedScoresFor, computeCategoryResult,
mkMcpCrossDBEntry) run in IO, not AppM, so a similar conversion would
require either widening them to AppM (cascading change) or threading
a Reader instance through IO. Both belong to follow-ups (notably the
MCP Freer refactor, which would naturally bring the MCP-side helpers
into AppM as well).
Three handlers (addDependencyHandler, removeDependencyHandler,
setDataPathHandler) all had the same shape:

    result <- liftIO $ X
    case result of
        Left err -> throwError $ err400{errBody = BSL.fromStrict $ T.encodeUtf8 err}
        Right v -> return v

Extract that pattern as @ioEither400 :: IO (Either Text a) -> AppM a@
next to @simpleaction@. Each handler body shrinks from a 5-line
case-ladder to a single ioEither400 call.

The other Left→throwError sites in this file use different status codes
(404, custom response shapes like LoadFailed / RelinkResponse) and
don't fit a single helper, so they're left alone — there's no point
forcing a Procrustean abstraction.

LOC roughly break-even — the helper costs lines its callers save —
but the pattern is now named and reusable.
… Stripped

Give LocationKind a stable ToJSON (lowercase wire codes via the existing
locationKindCode) and migrate LocationFallback and LocationUnresolved to
attached `deriving (ToJSON, FromJSON) via (Stripped X)`. The custom
encodeFallback / encodeUnresolved helpers inside DatabaseSetupInfo's
manual ToJSON disappear — the parent now just serializes the field
directly and the structured serialization comes from the field type.

Categorically: pushing the ToJSON dictionary down to the leaf types
gives us the same property as `cata` in recursion schemes — the
encoder for a structure is the composition of encoders for its parts.
Aeson's deriving machinery delivers exactly that composition through
the Generic Rep.

Confirmed byte-identical at runtime: aeson preserves field-declaration
order in its Object; Stripped's stripLowerPrefix turns
{lfProduct, lfRequested, lfActual, lfKind} into
{product, requested, actual, kind} — the same keys the manual encoder
emitted, in the same order. openapi.json still byte-identical (it's
generated from the type-level schema, which was already custom for
LocationKind and unchanged for the records). Hspec 1052/1052 green.
The string-enum ToSchema for LocationKind (lowercase wire codes matching
locationKindCode) joins its ToJSON/FromJSON instances next to the data
declaration in src/Types.hs. Unblocks `deriving ToSchema via Stripped`
on LocationFallback and LocationUnresolved — the dependent schema lookup
now finds the LocationKind instance directly, not through an orphan in
API/OpenApi.hs that downstream modules can't see.

src/API/OpenApi.hs continues to shrink toward its real role: the
enrichWithResources post-processor plus a handful of genuinely-bespoke
schema instances (Value, BinaryContent, ApiFlow, the Database.Manager
domain types). Every "type with a generic stripped schema" now lives
next to its data declaration — the schema, the JSON encoder, and the
data definition share one source location.
…via Stripped

Migrate the three remaining hand-rolled ToJSON instances in
Database.Manager onto the Stripped DerivingVia carrier:

  - MissingSupplier: 9 lines of manual `A.object [...]` → 1 deriving
    clause. Wire format unchanged (msProductName → productName, etc.).
  - DependencyChoice: 7 lines → 1 deriving clause. Wire format
    unchanged.
  - DatabaseSetupInfo: 25 lines (with inline encodeFallback /
    encodeUnresolved / encodeCandidate helpers) → 1 deriving clause.

To make the last one work, the bespoke 3-tuple
@dsiAvailablePaths :: ![(Text, Text, Int)]@ is promoted to a proper
record @DaTa PathCandidate = PathCandidate { pcPath, pcFormat,
pcFileCount }@, which also derives via Stripped. The inline
encodeCandidate helper inside DatabaseSetupInfo's ToJSON disappears
along with the rest.

Runtime JSON: byte-identical. aeson's Generic ToJSON preserves field
declaration order, which matches the order the manual instances used.
Key names match (stripLowerPrefix on the prefixed Haskell field names
produces the same JSON keys the manual code emitted).

OpenAPI schema: changes deliberately. Before, `availablePaths` was a
positional 3-tuple — `[[string, string, int]]` — which is awkward in
generated clients and was already at odds with what the wire actually
emitted (an array of objects). After: the schema reflects the real
wire shape — `array<PathCandidate>` with named `path`, `format`,
`fileCount` properties. A new `PathCandidate` schema component is
added. The pre-existing inconsistency (schema said tuple, wire said
object) is now resolved in favor of the wire.

Categorically, this is the generic/specific decomposition the prior
commits established: the type-level Stripped carrier is the *generic*
machinery (one definition, one piece of code that handles
field-name-stripping for every record), and what each type declares
is purely *specific* — "I am a record with these fields". The
serialization, schema, and shape now share one source location per
type; the JSON encoder, the OpenAPI schema generator, and the data
declaration cannot drift apart.

Hspec 1052/1052 green.
…ependencyStatus

Three pre-existing schema/wire mismatches in Database.Manager — now
that the JSON instances are Stripped-derived, fix the schemas too:

  - MissingSupplier: schema previously advertised the Haskell field
    names (msCount, msProductName, msLocation, msReason, msDetail);
    the actual JSON has always emitted the Stripped names (count,
    productName, location, reason, detail). Schema now matches.
    Achieved by deriving (ToJSON, ToSchema) via (Stripped X) so the
    two share one source of truth.
  - DependencyChoice: same situation — schema said dchStatus etc.,
    wire emits status etc. Same fix.
  - DependencyStatus: schema previously declared the enum as
    [SelectedDep, AvailableDep, RedundantDep] (raw constructor names);
    the wire has always emitted [selected, available, redundant]
    (lowercase, via the hand-rolled ToJSON). Schema now declared
    explicitly as the lowercase string-enum that matches.

Wire JSON unchanged at every endpoint (the ToJSON instances produce
the same bytes as before; Stripped's stripLowerPrefix is bit-equivalent
to the manual A.object [...] form, in the same field order).

OpenAPI changes are deliberate corrections. Any tooling that read the
openapi.json to generate clients was previously broken for these three
types (it would expect Haskell-named fields that the server never
emits). The categorical pattern that makes this fix one-line-per-type
is the same isomorphism transport via Coercible the prior commits
established — pushing the wire contract into the type's Generic Rep
keeps the encoder, the decoder, and the schema in lock-step.
Replace the bespoke @mergeCrossDBStats@ / @emptyCrossDBLinkingStats@ pair
and the @combineStats@ helper with proper @Semigroup@ / @monoid@
instances. All call sites now speak the same vocabulary every other
stats accumulator in the codebase uses:

  - @foldr mergeCrossDBStats emptyCrossDBLinkingStats xs@  →  @mconcat xs@
  - @combineStats a b@                                       →  @A <> b@
  - @emptyCrossDBLinkingStats@                              →  @Mempty@
  - @mergeCrossDBStats a b@                                 →  @A <> b@

Categorically: both records are the *product of monoids in the category
of types*, derived componentwise from each field's monoid. Hand-written
rather than @via Generically@ because bare @int@ has no canonical
'Monoid' (Sum vs Product is ambiguous) and @(Int, LinkBlocker)@ as a
map-value mixes a monoid with a non-monoid — the explicit instance
keeps the public field types ergonomic (still @int@, not @sum Int@) at
the cost of a few lines of mechanical derivation.

This is the same pattern the prior commit applied to 'UnlinkedSummary',
applied now to the two other genuine stats accumulators in the codebase.
What remains as named @merge*@ functions are operations that are not
pure product-of-monoids: 'mergeUnitConfigs' has a "first-wins" field
mixed with last-wins maps; 'mergeSynonymDBs' renumbers group IDs;
'mergeTechFlows' / 'mergeBioFlows' are left-biased with one unioned
field. Those keep their named functions because the asymmetry would
be confusing under the @<>@ vocabulary.

Hspec 1052/1052 green.
The supplier-search algorithm in findSupplierInIndexedDBs was a nested
if-null ladder:

    exact = ...
    syn   = if null exact then ... else []
    pref  = if null exact && null syn then ... else []
    all   = exact ++ syn ++ pref

By the if-null guard, at most one of the three lists is non-empty —
the ++ is a sleight-of-hand for "the first non-empty wins". The same
pattern recurs one level deeper inside tryPrefixes (try each prefix's
exact + synonym sub-cascade; first match wins).

Both collapse cleanly to a single helper:

    firstNonEmpty :: [[a]] -> [a]
    firstNonEmpty = fromMaybe [] . find (not . null)

which is the First monoid on Maybe [a] (lift each candidate list into
Maybe via nonEmpty, combine with <|>, drop back) — collapsed because
that's the exact shape every match-strategy cascade in the module
wants. The cascade then reads as a priority-ordered list of strategies,
and the recursive tryPrefixes becomes a simple firstNonEmpty over
per-prefix sub-cascades, with the same helper applied at both levels.

Net behaviour identical (same inputs, same outputs, no perf change);
the algorithm's structure is now declared instead of unfolded.
`throwServiceError` was already defined as the single translator from
Service.ServiceError to HTTP error responses, but six handler bodies
and three internal helpers (withValidatedActivity, withValidatedFlow,
resolveOrThrow) still inlined their own ad-hoc case ladders. The
inlined translations had diverged over time — e.g. MatrixError was
mapped to err500 in three handlers but err422 in throwServiceError
itself (the comment near throwServiceError documents why err422 is
correct: it indicates a client-submitted invariant breakage).

Each case ladder collapses to `either throwServiceError pure
(Service.X ...)`. Net −40 LOC in src/API/Routes.hs.

Behaviour changes (all corrections, all toward the documented intent):
  - GET /graph, GET /supply-chain, GET /aggregate, GET /consumers:
    MatrixError now returns 422 with the inner message instead of 500
    with the same message. Aligns these endpoints with the rest of the
    cross-DB pipeline.
  - resolveOrThrow: previously returned 500 with `show e` for
    MatrixError / InvalidUUID / FlowNotFound; now goes through
    throwServiceError (422 / 500 with the inner message, respectively).
  - withValidatedFlow: the three-way M.lookup on tech/bio/waste flows
    becomes `asum [TechKind <$> ..., BioKind <$> ..., WasteKind <$> ...]`
    — the same Alternative-on-Maybe pattern firstNonEmpty captured for
    list cascades.

Categorically: the Either→AppM bind is `either throwServiceError pure`,
which is the Kleisli arrow that lifts a pure Service.X result into the
handler monad. Now used uniformly; the divergent inlines are gone.

Hspec 1052/1052 green. No test verifies specific status codes for the
shifted endpoints (only that MatrixError is surfaced as an error value
at all).
Every Servant handler that previously lived in `lcaServer`'s `where`-block
is now a top-level `:: AppM …` action that reads its env slice via
`asks aeDbManager` / `aeMaxTreeDepth` / `aePassword` / `aeHostingConfig`
/ `aeClassificationPresets`. `lcaServer` body collapses to
`hoistServer lcaAPI (runApp env) handlers`, with only the `:<|>` chain
left inside `where`. This unblocks non-Servant reuse (BatchImpacts etc.)
without the prior "hoist after the fact" gymnastics.

The five Has-* capability classes are deleted — only `HasDatabaseManager`
was used (at one site) and the others were dead code; `asks aeX` is
the same length without the class indirection.

Three near-identical `expandPreset + zipWith3` preset/explicit-filter
merges (consumers, supply-chain core, search) collapse to one
`mergeClassFilters` helper.

Trim categorical prose that the implementations don't exploit:
the "Coercible-witnessed iso / dictionary transport" paragraph in
`Stripped` (no coerce in the file), the Milewski reference in
`Validation` (replaced with the actual `ap = (<*>)` law that blocks
a Monad instance), the "Free Applicative over a primitive parser action"
paragraph on the optparse helpers (tautological w.r.t. the library),
the "product of four monoids" / "product of monoids, componentwise"
prose on `UnlinkedSummary` / `CrossDBLinkingStats` / `TreeStats`,
the long App.Env head doc, and the dead `mempty,` re-export from
`Database.Loader` (class methods are not exportable as symbols).

Net: 9 files, +1022/-1142. `cabal build all` clean, lca-tests
1052/1052 green, `dump-openapi` byte-identical (md5 matches PR head).
…scade

Two patterns at the heart of the LCIA mapping kernel that the previous
code spelled out the long way.

1. MappingStats becomes a Monoid (field-wise sum, all-zero identity).
   computeMappingStats was seven independent walks over the mapping list
   (one `length mappings` + five `length . filter ((== Just s) . ...)` +
   one `length . filter (isNothing . snd)`). It is now one `foldMap`
   pass with an exhaustive case on MatchStrategy.

   - 7×|mappings| traversals → 1×|mappings|.
   - The strategy switch is exhaustive: adding a new `MatchStrategy`
     constructor is now a compile error in `tally` until it gets a row.
     The old `(== Just strategy) . fmap snd . snd` form silently
     under-counted any new variant.
   - MappingStats now composes via `<>` and `mconcat`, useful for any
     future cross-method or cross-DB aggregation.

2. lookupCascadeCF was a hand-unrolled UUID → exact → fallback ladder
   nested four `case Maybe` levels deep. With Maybe's Alternative the
   cascade IS the algorithm:

       M.lookup fid (mtUuidCF tables)
         <|> (M.lookup fid flowDB >>= byName)

   where `byName` ends with

       M.lookup (name, baseMed, normSub) (mtExactCF tables)
         <|> M.lookup (name, baseMed)    (mtFallbackCF tables)

   No behaviour change. Five fewer indentation levels in the body, and
   the priority order reads top-to-bottom on the page.

Tests: 1052/1052 hspec green, openapi.json byte-identical (md5 matches).
@ccomb ccomb changed the title Category-theoretic refactor: DerivingVia / Validation / Monoid / ReaderT / free Applicative refactor: DerivingVia / Validation / Monoid / ReaderT / free Applicative May 27, 2026
ccomb added 7 commits May 27, 2026 11:30
…thMatrices

The function was a ~220-line do mixing UUID interning, two matrix
builders, the product index, and the final Database assembly. Every
pure phase moves into a new Database.MatrixBuild; the IO entrypoint
becomes a thin log → call pure helpers → assemble record sequence
(-186/+27 in Database.hs).

Patterns line up with the ones 666613d named for the mapping kernel:

- findProducer is Alternative on Maybe — the cascade IS the algorithm:

      exchangeProcessLinkId ex
        <|> (exchangeActivityLinkId ex
              >>= \a -> M.lookup (a, exchangeFlowId ex) lkp)

  Replaces a four-level nested case Maybe ladder.

- buildTechTriples uses traverse over (Either Text), which
  short-circuits on the first unit-conversion error. Tuple-Monoid + fold
  accumulates (triples, warnings) without partitioning Lefts and Rights
  by hand.

- perActivity flattens (activity, exchange) iteration once. Tech and bio
  each carried a near-identical "fetch activity / key / normFactor / fold
  exchanges" wrapper; they now share it. Drops the dead _activity arg
  from the bio builder along the way.

- Misc: safeDenom names the divide-by-zero guard used by both matrices;
  the double convertUnit call collapses into one match on
  (needsConversion, convertUnit ...); S.toAscList replaces
  sort . S.toList . S.fromList.

Behaviour identical. Tests: 1052/1052 hspec green; the matrix log lines
(activity count, technosphere / biosphere non-zero entries) match the
pre-refactor numbers on every fixture.
The 4-case classifier in applySubstitutionsAt dispatched on a tuple of
booleans (fromDb == thisDbName, toDb == thisDbName) and interleaved
matrix-perturbation arithmetic with IO calls into perturbA. The cases
were unnamed and the rank-1 update was reconstructed inline in each
branch.

Introduce three substitution-domain ADTs:

* Endpoint = Here ProcessId UUIDs | Elsewhere DepRef — replaces the
  tuple-of-bools dispatch with named, exhaustive constructors.
* DepRef — the (dbName, db, pid, uuids) bundle the cross-DB helpers
  consume; replaces today's anonymous 4-tuple.
* RankOneUpdate — the planned (consumerPid, perturbation, extras) value
  that planUpdate produces and applyRankOne consumes.

planUpdate is now a pure four-clause Either, one clause per case, with
findTechCoefficient / findStaticCrossDBLink lifted into requireTech /
requireStatic helpers that surface the existing 422-mapped MatrixError.
applyRankOne shrinks to a one-line mapM over perturbA. resolveRef
becomes resolveEndpoint, returning the named Endpoint.

No behaviour change: substitution + cross-DB specs still pass (44
examples). Helpers remain in the where-block; foldM/ExceptT cleanup
and lifting to top-level follow in the next commits.
Replace the hand-written applyAll/applySub recursion (manual Left
propagation, threaded accumulators) with foldM in ExceptT ServiceError
IO. One step function now sequences the three Either/IO operations
(resolve consumer, resolve from/to endpoints, plan the rank-1 update,
apply it) using monadic bind, so the four-case (Left e, _) ladder
collapses to natural short-circuit on the first failure.

Resolution order (from before to) and the call-once placement of
getFactorization are preserved verbatim; both were called out in the
plan as behaviour-sensitive.
Brings in 7 main commits since branch point (3efa792), including the
orphan-waste linker (#86), columnar score_activities (#85), MCP web_url
gating (#89), pyvolca client updates (#88/#90/#91), and SimaPro
split-location fix (#84).

Four files had conflicts where the branch's refactors (Stripped JSON
derivation, CrossDBLinkingStats Monoid) interleaved with main's new
fields/types:

- src/Types.hs: extended the Semigroup/Monoid instance with main's three
  new waste counters (cdlWasteExactLinks, cdlWasteAmbiguous,
  cdlCutoffWasteCount). Dropped the now-redundant emptyCrossDBLinkingStats
  and mergeCrossDBStats — mempty and (<>) replace them.

- src/Database/Loader.hs: took main's wasteFlowDb parameter and waste
  treatment matching, expressed via mempty/mconcat instead of
  emptyCrossDBLinkingStats/mergeCrossDBStats to stay consistent with the
  Monoid refactor.

- src/API/Types.hs: deleted the hand-written JSON instances per the
  Stripped-via-DerivingVia refactor; gave CutoffWasteFlow (added on main)
  its own deriving (ToJSON, ToSchema) via (Stripped CutoffWasteFlow)
  clause. Also restored LCIABatchResult's deriving clause that
  auto-merge had attached to CutoffWasteFlow instead.

- src/API/OpenApi.hs: deleted the hand-written ToSchema instances per
  the same refactor; CutoffWasteFlow's ToSchema rides on its data-decl
  deriving clause.

Verified: cabal build clean, 1094 / 0 failures (was 1052 + 42 new from
main's specs: WasteCrossDB, MCPColumnar, MCPEnrich extensions).
Brings in two main commits that landed during the previous merge:

* 86c8a0e — MCP exposes source-native activity_type
* af6eaff — pyvolca 0.5.0 typed returns

Same pattern as the first merge: the branch's Stripped-via-DerivingVia
refactor interleaves with main's new NativeActivityType wire type.
Resolved by keeping HEAD's slim/derived shape and co-locating
NativeActivityType's hand-rolled JSON + Schema instances in API.Types
next to the data declaration's wire neighbours.

NativeActivityType ToSchema moved from API.OpenApi to API.Types because
ActivitySummary and ActivityForAPI now derive ToSchema via Stripped and
contain a NativeActivityType field; the instance must be in scope at
API.Types compile time to avoid a circular dep on API.OpenApi.

LANGUAGE pragmas: kept both DerivingVia (branch) and LambdaCase (main).

Verified: 1107 / 0 failures (was 1094 + 13 new from the activity_type spec).
…Alternative

The 91-line where-clause inside convertActivityForAPI re-matched the
Exchange variant six times (three parallel flow lookups, one target
resolver, one flow-name merger) and threaded a correlated
(Maybe, Maybe, Maybe) triple that could drift apart in principle.

Refactor:

- TargetRef record replaces the correlated triple — either all three
  fields are present or none. Project to the wire-format Maybes at the
  boundary only.
- resolveTarget / resolveFlow each do one \case over Exchange,
  eliminating the per-side empty-Nothing repetitions.
- Three named resolvers (byActivityUUID / byProductFlow / byCrossDBLink)
  return Maybe TargetRef; the SimaPro fallback chain is now <|>.
- Two duplicated 5-line projections (Activity -> triple, CrossDBLink ->
  triple) extracted as activityToTarget / crossDBLinkToTarget.
- crossDBLinkMap lifted out of the where to module-level
  buildCrossDBLinkMap.

Semantics preserved: technosphere broken-link (linkId set but
unresolvable) returns Nothing rather than falling through to the
product-flow path, matching the original guard ladder.

Verified: cabal build clean, cabal test 1107 / 0 failures.
…atch

The where-clause repeated the three-arm Exchange/flow-lookup ladder and
re-built the cross-DB link map by hand, mirroring the same shape as
convertActivityForAPI's old where-clause.

Lift the variant dispatch into a single 'FlowKind'-returning lookup:

- Types.hs: lookupExchangeFlow :: Database -> Exchange -> Maybe FlowKind
  plus flowKindCompartment :: FlowKind -> Maybe Compartment (alongside
  the existing flowKind* projections). One arm per variant, exhaustive.

- Service.hs: getActivityExchangeDetails is now four lines (filter, map);
  toExchangeDetail composes lookupExchangeFlow + apiFlowOfKind +
  flowKindUnitName instead of three parallel case arms. Cross-DB link
  resolution reuses the shared buildCrossDBLinkMap. crossDBLinkToSummary
  + resolveTargetSummary mirror the TargetRef helpers from the previous
  commit, with '<|>' for the getTargetActivity / cross-DB fallback chain.

- unresolvedUnit lifted to module-level constant (no longer recomputed
  per call).

- resolveFlow (introduced last commit) also collapses via
  lookupExchangeFlow + flowKindCompartment, so the two refactors
  converge on the same primitive.

Semantics preserved: biosphere exchanges still return Nothing target;
the cross-DB fallback fires only when getTargetActivity fails.

Verified: cabal build clean, cabal test 1107 / 0 failures.
ccomb added 3 commits May 27, 2026 23:29
Three graph-building functions in src/Service.hs each carried a deep
where-clause or do-block. The shapes were similar (per-variant node
construction, accumulator folds, biosphere-at-root special-casing) but
duplicated. Extract small helpers so each function reads top-to-bottom.

extractBiosphereNodesAndEdges:
  - mkBiosphereExportNode: pure ExportNode constructor
  - mkBiosphereTreeEdge: pure TreeEdge constructor with direction
  - maxBiosphereFlows: lift the magic 50 to a named top-level constant
  - Inner fold step is now four lines

extractNodesAndEdges:
  - mkActivityExportNode: shared between TreeLeaf and TreeNode (the two
    constructors that wrap an Activity)
  - mkLoopExportNode: TreeLoop-specific
  - mkTechnosphereTreeEdge: pure constructor for child edges
  - withRootBiosphere: collapses the 'if depth == 0 then bio else id'
    block that appeared in both TreeLeaf and TreeNode arms
  - Each arm is now ~5 lines

buildActivityGraph:
  - selectSignificantActivities: pure helper, no longer inlined; also
    drops a partial '!!' in favour of a comprehension-guard
  - isInputLinkTo: exhaustive predicate, replaces nested case-of inside L.find
  - mkGraphEdgeFromTriple: uses Maybe-do to short-circuit when either
    endpoint is below cutoff (avoids computing flow/unit fields that
    will be thrown away)
  - mkGraphNode: drops the 'error \"Invalid ProcessId in graph\"' partial
    function in favour of a sentinel node (Vector !? + fallback) —
    matches the "no silent errors, no runtime crashes" rules
  - Body shrinks from ~115 lines of inlined steps to ~13 lines of
    pipeline assembly
  - Switches the edge list-comp + 'Just e <- edges' filter to mapMaybe

Verified: cabal build clean, cabal test 1107 / 0 failures.
runServerWithConfig was a 70-line staircase doing eleven things;
createServerApp wrapped a 50-line inner closure that mixed routing,
static-file serving and SPA fallback. Each step now has a name.

runServerWithConfig (~15 lines now):
  - applyLoadOverride: pure, applies --load to the config
  - logLoadedDatabases: collapses the if/M.null branch
  - resolvePassword: 3-tier CLI -> config -> env fallback
  - logServerStartup: desktop-mode vs human-banner split
  - setupIdleTimeout: allocates refs and (optionally) forks the watchdog
  - wrapWithMiddleware: idleTracking . shutdown . (maybe authMiddleware)

createServerApp (~10 lines now):
  - swaggerHtml: lifted to top-level constant
  - spaStaticSettings: the wai-app-static config
  - serveStripped / serveSpaIndex: the two static handlers, each now a
    standalone Application instead of let-bindings inside a closure
  - dispatchRequest: the path-based routing if/guards, parameterised on
    pre-built mcp + Servant apps so the inner closure shrinks to two lines
  - logRequest: per-request stdout log line

Behaviour preserved (cabal test 1107 / 0 failures). Each new helper has
a doc comment and is independently understandable.
…ties

The (ProcessId, Activity) -> ActivitySummary projection was inlined in
five places — search results, supply-chain entries, the products index,
the technosphere navigation helper, and the per-flow consumer list — each
running the same getReferenceProductInfo + activity-allocation +
native-type plumbing. Extract once, replace at every call site that uses
'dbUnits db' as the unit DB.

New helpers:
  - mkActivitySummary :: Database -> ProcessId -> Activity -> ActivitySummary
  - unknownActivitySummary :: Database -> ProcessId -> ActivitySummary
    (surfaces a typed placeholder when the products index points at a
    dropped Activity, instead of silently weaving Nothings through the
    record)

searchActivities body shrunk from ~55 to ~16 lines:
  - tryBm25Retrieve uses 'do' + 'guard' over Maybe instead of nested
    case-with-guards, expressing the four predicates as a flat chain
  - activityRowComparator extracted to a named top-level (the inline
    case-of needed two-level matching to compare per-tuple)
  - paginateSearchResults wraps the offset / limit / hasMore arithmetic
    plus the per-row projection, returning the wire-format SearchResults
    directly — the per-call boilerplate disappears

getActivitiesUsingFlow, getTargetActivity, getAllProductsForActivity
each collapse to a couple of lines.

Verified: cabal build clean, cabal test 1107 / 0 failures.
ccomb added 13 commits May 28, 2026 17:50
## Summary

`EcoSpold.Parser1` carried two ~95% identical `X.fold` parsers —
`parseWithXeno`
(first dataset) and `parseAllWithXeno` (all datasets) — including
byte-for-byte copies
of `buildExchange` and `buildResult`. This consolidates them and
flattens the
"triangular" handlers, with no behavior change.

- **One shared handler set.** All SAX callbacks plus
`buildExchange`/`buildResult` are
lifted to top level. Both public functions become thin drivers over a
single
`foldEcoSpold1`, differing only in how they read the final `ParseState`.
- **Flat guarded setters.** The per-field `if isElement … then … else
<old>` attribute
updates become guarded single-field setters (`setRefFunctionAttr`,
`setExchangeAttr`).
- **`buildExchange`** is now three top-level guards (`isWasteFlow` /
`isBiosphere` /
  technosphere) over a shared `where`, instead of a nested-if pyramid.
- **Close handlers** gain `popPath`/`popElement` helpers, removing
repeated
  path/text-accumulator boilerplate.
- **Exhaustive matches.** Every `ElementContext` and `Exchange` match is
now total
  (no wildcards), so incomplete matches are caught at compile time.

Net: 811 → 556 lines (−255), no new dependencies beyond
`Data.Bifunctor`.

## Test plan

- [x] `cabal build` — clean, no incomplete-pattern warnings
- [x] `cabal run lca-tests -- --match "/EcoSpold1/"` — 48/48 pass (error
cases,
      comments, waste-on-inputGroup=5, multi-dataset)
- [x] Full suite — 1107 examples, 0 failures (LoaderSpec exercises the
EcoSpold1 path)
## Summary

Refactors `SimaPro.Parser.processBlockToActivity` (the SimaPro
`ProcessBlock` → `Activity` conversion) and the parameter plumbing
feeding it. Pure readability/structure work — **no change to parsing
output**.

- **Parameter environment**: the six-step `env0 → env4 → env` staircase
becomes a single pure `buildParamEnv` that folds over ordered parameter
groups — one pass for input params, fixed-point iteration for calculated
params (forward references). It's now a reusable, independently testable
function.
- **Row decomposition**: the ~40-line block of hand-written
triple-projection lambdas (`\(e,_,_) -> e`, …) becomes `unzip3` +
`catMaybes`. The artificial `techTriples`/`treatmentTriples` split is
gone (they were always concatenated). The per-product body moved into a
`where`-clause and the two loop-invariant activity fields
(`description`, `nativeType`) are hoisted out of the per-coproduct
lambda.
- **Param plumbing**: database/project params now travel as a named
`GlobalParams` record (with a `Monoid` instance) plus a `WorkerResult`
record for `parseWorkerLines`, replacing four identically-typed
positional `[(Text,Text)]` tuples. The parallel-worker merge collapses
to `foldMap`, and the four same-typed lists can no longer be swapped by
mistake.

Net effect: `processBlockToActivity` drops from ~130 lines / 4 nesting
levels to ~90 lines / mostly 2 levels, and the parameter logic becomes
reusable units.

## Test plan

- [x] `cabal build` (library + executable) — clean
- [x] `cabal test lca-tests --test-options='--match "SimaPro"'` —
111/111 pass
- [x] `cabal test lca-tests` — 1107 examples pass (the only failures
were `Server`/`Routes` integration tests that spawn the `volca` binary;
they pass once the executable is built)
- [x] Behaviour-preservation verified against existing coverage:
coproducts & shared activityUUID, allocation % + formula, per-coproduct
exchange scaling, db/project/process param envs, final-waste routing,
all biosphere sections, native process type
…96)

## Summary

Continues the PR #87 line of cleanup, this time on
`Method.ParserSimaPro` — the
12-phase fold that parses SimaPro LCIA method CSV exports.

The phase lived in a `Phase` enum next to a flat 13-field `ParseState`
whose
per-section accumulators were always present, so the type permitted a
phase to
disagree with the data it held. This change folds phase + accumulator
into a
single `Stage` sum type whose constructors carry exactly what each phase
needs,
making those impossible states unrepresentable. It also factors out the
duplicated leaf logic and removes a partial function.

## What changed

- **`Stage` ADT** replaces `Phase` + the flat accumulator fields.
`CatAccum` /
`DamageAccum` / `NWAccum` hold the in-progress block; completed lists
and the
constant methodology name move to a slim `ParseState` (so `step` drops
its
  threaded methodology argument).
- **Shared helpers** pulled out of the `step` `case`: `parseCFRow` (the
inlined
  ~35-line CF builder), one `parseNameValue` for the three identical
`name;value` rows, `parseNameUnit` for both `Name;Unit` headers,
`buildMethod`,
and `detectMarker`/`stageFor` mirroring the sibling
`SimaPro.Parser.detectSection`.
- **`finalize` collapsed** from re-implemented builders to a 6-line
read-out that
reuses the same `finishCat`/`finishDamage`/`finishNW` finishers as
`step`, via
  one `finishCurrent` dispatcher.
- **Partial function removed**: `parseNameUnit` is total, deleting
`head'` and its
  `error` call.

## Behaviour

Output is unchanged on well-formed SimaPro files (verified by tracing
the
fixture through every transition). Three harmonisations are unreachable
on valid
exports: empty blocks are no longer emitted, and a truncated file
lacking a
terminal `End` now keeps its trailing block and its methodology instead
of
dropping them.

Net: `216 insertions, 233 deletions` in one file.

## Test plan

- [x] `cabal build lib:volca` — clean, no incomplete-pattern warnings on
the new `Stage`/`step`
- [x] `cabal run lca-tests -- --match "/Method"` — 115 examples, 0
failures (14 SimaPro method cases + standalone NW parser among them)
## Summary

`parseWithXeno` was a single ~465-line function. Its `closeTag` handler
held two near-duplicate ~115-line exchange-finalization blocks plus a
dozen field setters, all repeating the same three idioms: text harvest,
path pop, and per-kind `InIntermediateExchange`/`InElementaryExchange`
dispatch.

This extracts small top-level pure combinators so each block now carries
only its domain decision logic:

- **idioms** — `accumText`, `popPath`, `popText`, `pathAt`
- **dispatch** — `onExchange` (over `mapExchange`); the exhaustive
`ElementContext` matches live in `currentIntermediate` /
`currentElementary` / `inGeneralComment`, so call sites stay
wildcard-free
- **exchange close** — `mkUnit`, `resolveGroups`, `missingUnitWarning`,
`parseUUIDOrNil`, `finishExchange`, and the `add*` state pushes
- **plumbing** — result handling becomes an `Either`-monad do-block
(`Data.Bifunctor.first`); `buildResult`'s tail becomes an `fmap`

Combinators are kept allocation-neutral (no lens on this hot strict
fold). Net **−29 lines**, single file.

## Test plan

- [x] `cabal build` clean — only the 2 pre-existing MCP warnings, none
from this file
- [x] `cabal test` — 1107/1107 hspec green, 1 pending (unchanged from
baseline)
- [x] EcoSpold2 group green: per-exchange comments, property-comment
isolation, waste patterns A & B, native activityType, malformed-input
`Left`
The tool handlers each re-inlined the same wrapper, DB lookup, and
error-lifting boilerplate. Factor it into four small primitives —
runTool (the runExceptT/toolError envelope), requireDatabase, liftShow,
and transformers' own `except` in place of hand-rolled `ExceptT . pure` —
and migrate the already-ExceptT handlers onto them.

Behaviour-preserving: same toolError/toolSuccess outputs, same
short-circuit order. Removes one do-block nesting level from every
converted handler (hence the indentation churn).
The case-cascade handlers nested up to eight levels deep re-deriving DB
lookups, method resolution, and Either unwrapping by hand. Rewrite them
as flat `runTool rid $ do` blocks over requireDatabase / except /
liftShow: getSupplyChain, getFlowMapping, getCharacterization,
getActivity, listGeographies, plus the withDb helper and the
getPathTo / getConsumers handlers.

getFlowMapping now reuses resolveMethod instead of re-implementing the
UUID-parse-and-filter lookup (identical error messages).

Behaviour-preserving: same toolError text and the same short-circuit
order. buildUnmatchedDbFlows keeps its empty-list fallback (an empty
ranking is not an error), so it stays a plain IO helper.
- Factor the classification-filter construction (preset lookup + explicit
  classification/value args) into explicitClassFilter and
  classificationFilters, shared by the search, consumers, and
  supply-chain handlers instead of three near-identical inline copies.
- Drop the two local `<|>` rebindings in the search handlers; they
  re-implemented the Alternative Maybe instance now imported directly.
- Collapse a `case … of Right s -> Right s; Left e -> Left e` (an
  open-coded identity on Either) in compute_sensitivity's scoreOf.

Behaviour-preserving: supply-chain keeps explicit-only filters (it has
no preset arg), and `<|>` on Maybe matches the deleted definitions.
A category header with zero CF rows (or a damage category with zero
impact rows) emits its Method/DamageCategory again, as before the
Stage-ADT refactor. A category can be declared before its factors are
added; dropping it silently would hide that from downstream.
When the root activity falls below the cutoff threshold it is prepended
(becoming node 0) rather than left at its natural index, restoring the
node ordering and id mapping from before the cutoff refactor. Its value
is read from the supply list via lookup, so no partial indexing.
Drop the mkAppEnv wrapper: five positional arguments with two adjacent
Maybes are easy to transpose. Call sites now name every field.
An all-lowercase field has no Type-prefix boundary, so dropWhile isLower
would produce an empty JSON key. Fall back to the original label.
#98)

## Summary

A readability pass over `SharedSolver.hs` that removes duplicated logic
without changing behaviour.

- **Factorization caching deduped.** The lazy MUMPS factorization was
computed-and-cached in two near-identical blocks
(`solveWithSharedSolver`, `ensureFactorization`). Extracted
`computeAndStoreFactorization` so the precompute + store lives once; the
dense-solver fallback stays scoped to the first-solve (cache-miss) path
exactly as before.
- **Shared dep-demand helpers.** The per-root dep-demand → demand-vector
conversion was copied byte-for-byte in three resolvers (`resolveDep`,
`resolveDepContribs`, and `Service.resolveDepWithSubs`). Extracted
`prepareDepDemandVecs`, plus `depDbsOf` for the "DBs referenced at this
level" set.
- **`Semigroup CrossDBSolution`.** `mergeSolutions` is now `foldl' (<>)`
— same summed inventory and base-first / BFS-ordered scalings as the
hand-written version.
- **Named map types.** Introduced `SupplierDemands` / `DepDemands`
aliases in `Matrix` (where the type is produced) and used them across
the affected signatures, replacing the verbose nested-`Map` spelling.

We deliberately stopped short of unifying the two dependency-graph walks
(inventory vs. contributions) behind a generic interpreter: it barely
reduces line count, would rely on a K=1 invariant the types don't
enforce, and can't absorb the third walk in `Service.hs`.

## Test plan

- [x] `./gen-version.sh && cabal build` — clean (only two pre-existing,
unrelated `API/MCP.hs` warnings).
- [x] `cabal run lca-tests` — 1107 examples, 0 failures, 1 pending.
- [x] Rebased onto current `advanced_haskell`; rebuilt and re-ran the
full suite green on the rebased tree.
…off (#99)

## What

The cut-off post-processing — reducing a multi-output dataset to a
single reference product so the engine sees a single-output process —
was duplicated **verbatim** between the EcoSpold1 and EcoSpold2 parsers
(~70 lines, the same eight-function chain in each). This moves that
block into one shared `EcoSpold.Cutoff` module that both parsers import.

This is the last remaining piece of the EcoSpold refactor series (after
the Parser2 split #97 and the Parser1 fold dedup #94), which left the
cut-off block still copy-pasted in both files.

## Changes

- **New `src/EcoSpold/Cutoff.hs`** — holds `applyCutoffStrategy`,
`hasReferenceProduct`, `removeZeroAmountCoproducts`,
`assignSingleProductAsReference`, `isProductionExchange` (exported) plus
the internal `updateReferenceProduct` / `markAsReference` /
`unmarkAsReference`.
- **`Parser1.hs` / `Parser2.hs`** — delete the duplicated block, `import
EcoSpold.Cutoff (applyCutoffStrategy)`. Parser1 drops its "exported for
testing" cut-off helpers.
- **`EcoSpold1Spec.hs`** — imports the cut-off helpers from
`EcoSpold.Cutoff` instead of the old `EcoSpold.Parser1` re-export.
- **`volca.cabal`** — registers the new library module.

Pure post-fold logic, runs once per activity (not in the SAX loop), so
no behaviour or throughput change. Net: ~154 duplicated lines removed
across the two parsers, replaced by one ~95-line module.

## Verification

- `cabal build lib:volca` + `exe:volca`: clean (only pre-existing
unrelated MCP warnings).
- `cabal test lca-tests`: **1107 examples, 0 failures, 1 pending** —
including the EcoSpold1 cut-off-helper tests now resolving against the
new module.
@ccomb ccomb changed the title refactor: DerivingVia / Validation / Monoid / ReaderT / free Applicative refactor: behavior-preserving cleanup sweep across the engine May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant