Skip to content

Introduce explicit I/O namespace#467

Merged
Krzmbrzl merged 32 commits intoValeevGroup:masterfrom
Krzmbrzl:namespace-parse
Feb 7, 2026
Merged

Introduce explicit I/O namespace#467
Krzmbrzl merged 32 commits intoValeevGroup:masterfrom
Krzmbrzl:namespace-parse

Conversation

@Krzmbrzl
Copy link
Collaborator

@Krzmbrzl Krzmbrzl commented Jan 26, 2026

See individual commits.

TODO:

  • Make options take syntax version param
  • Provide explicitly namespaced/versioned access to parse/deparse
  • Implement defaulted parse/deparse
  • Remove wolfram output
  • Merge wstring.hpp into utility/string.hpp
  • Remove stray to_string overloads - support for that is quite inconsistent and user's should prefer using explicit I/O functions
  • Add io namespace for all text<->object conversions

@Krzmbrzl Krzmbrzl force-pushed the namespace-parse branch 5 times, most recently from 2300ab4 to 73481f8 Compare January 27, 2026 12:33
@Krzmbrzl Krzmbrzl marked this pull request as ready for review January 27, 2026 14:32
@Krzmbrzl Krzmbrzl requested a review from evaleev January 27, 2026 14:56
@Krzmbrzl Krzmbrzl force-pushed the namespace-parse branch 4 times, most recently from eec7765 to 50c761e Compare February 2, 2026 15:15
@Krzmbrzl Krzmbrzl marked this pull request as draft February 2, 2026 15:29
@Krzmbrzl Krzmbrzl changed the title Revamp parse function interface Introduce explicit I/O namespace Feb 4, 2026
@Krzmbrzl Krzmbrzl force-pushed the namespace-parse branch 2 times, most recently from b230c6e to 0f47709 Compare February 4, 2026 15:12
@Krzmbrzl Krzmbrzl requested a review from evaleev February 4, 2026 15:50
@Krzmbrzl Krzmbrzl marked this pull request as ready for review February 5, 2026 08:30
@ajay-mk
Copy link
Member

ajay-mk commented Feb 5, 2026

Will this PR be included in the 2.1 release? I am asking because MS1 has an example using parse_expr, which we should update.

@Krzmbrzl
Copy link
Collaborator Author

Krzmbrzl commented Feb 5, 2026

Generally speaking, I would have thought so. The main motivation for this was to have versioning capabilities for the serialization format 👀

@evaleev
Copy link
Member

evaleev commented Feb 6, 2026

Generally speaking, I would have thought so. The main motivation for this was to have versioning capabilities for the serialization format 👀

Yes, it will be good to have in 2.1 .

Copy link
Member

@evaleev evaleev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is produced by Claude 4.6, with my annotations. Some points can be ignored but keeping for posterity.

High severity

  1. is_tensor concept requires I/O capabilityabstract_tensor.hpp now requires { io::latex::to_string(obj) } -> std::convertible_to<std::wstring> in the is_tensor concept. This couples the fundamental tensor interface to the I/O layer, inverting the dependency direction. A type should be a "tensor" based on its mathematical properties, not on whether it can produce LaTeX. Consider a separate is_latex_renderable_tensor concept or removing this requirement from is_tensor.

  2. I DON'T THINK THIS IS IMPORTANT: SerializationError base class change — Old ParseError inherited from std::runtime_error; new SerializationError inherits from Exception : std::exception. Downstream code catching std::runtime_error& will silently miss these errors.

  3. I DON'T THINK THIS IS NECESSARY: No backward-compat shims for parse_expr/deparse — The old parse_expr(wstring_view, ...) and deparse(ExprPtr, ...) free functions are completely gone with no deprecation wrappers. All downstream code using these will break immediately. The shorthands header provides deserialize/serialize but not the old names.

  4. Stale #undef macros in serialization.cpp — The file undefs SEQUANT_RESOLVE_PARSE_FUNC and SEQUANT_RESOLVE_DEPARSE_FUNC which were never defined. The actually-defined macros (SEQUANT_RESOLVE_DESERIALIZATION_FUNC, RESOLVE_SERIALIZE_FUNC) leak past the end of the file.

Medium severity

  1. Include guard namingserialization.hpp reuses SEQUANT_PARSE_HPP (the old guard). v1/ast.hpp has mismatched guards (SEQUANT_CORE_PARSE_V1_AST_HPP at #ifndef vs SEQUANT_CORE_PARSE_AST_V1_HPP at #endif). These should match the new file paths.

  2. to_latex(ExprPtr) may not work without dereferencing — The old to_latex(const ExprPtr&) from expr_algorithms.hpp was removed. The new io::latex::to_string has a pointer_can_call_to_latex concept overload, but tests were updated from to_latex(e) to to_latex(*e), suggesting the pointer overload may not be working correctly. Worth verifying.

  3. is_tensor changed from trait to conceptis_tensor_v<T> and is_tensor<T>::value are gone. Any downstream std::enable_if_t<is_tensor_v<T>> usage will break.

  4. to_string/to_wstring renamed to toUtf8/toUtf16 — No deprecated aliases provided for the old names.

  5. { label(obj) } -> std::constructible_from<std::wstring> inconsistent — Every other constraint in the is_tensor concept uses std::convertible_to<T>, but this one uses std::constructible_from<T> (more permissive, allows explicit conversions). Should be std::convertible_to<std::wstring> for consistency.

  6. to_string overload ambiguity risk — The three io::latex::to_string overloads (for has_to_latex_member, pointer_can_call_to_latex, and _to_latex()) are not fully mutually exclusive. A type with both to_latex() and _to_latex() would match overloads 1 and 3, with resolution depending on reference binding (T&& vs const T&). This is fragile.

Low severity

  1. Wrong namespace closing comment in serialization.hpp — Says } // namespace sequant, should be } // namespace sequant::io::serialization.

  2. Removed diagnostic output in ITFitf.hpp removed std::wcerr << deparse(product) before a throw rather than updating to the new API. Loses debugging info.

  3. #if 0 dead code in utility/string.hpp — The old to_string/to_wstring templates are disabled but not removed.

  4. python/test.py added — Appears unrelated to the I/O namespace refactoring. Accidental inclusion?

  5. Pre-existing sign bug preserved in serialize_scalar — The imagNumerator < 0 ? " + i " : " - i " logic appears inverted for positive imaginary parts (the subsequent .str() includes its own sign, so 1 + 3i would serialize as "1 - i 3"). Carried forward from old deparse_scalar — not introduced by this PR but the refactor was an opportunity to fix it.

  6. Tabs in macro block in serialization.hpp v1 namespace forward-declarations — inconsistent with the rest of the codebase.

@evaleev
Copy link
Member

evaleev commented Feb 6, 2026

also: how about defaulting target type T to ExprPtr to shorten deserialize<ExprPtr> and from_string<ExprPtr> to deserialize<> and from_string<>? These will appear in all tutorials/slides etc. so every few shaved chars will matter.

@evaleev evaleev added this to the 2.1 milestone Feb 6, 2026
Copy link
Member

@evaleev evaleev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Krzmbrzl
Copy link
Collaborator Author

Krzmbrzl commented Feb 6, 2026

is_tensor concept requires I/O capability

This is just a 1-to-1 translation of the previous requirement. I'm happy to drop it though (latex::to_string will now always be available due to the presence of the _to_latex member function)

to_latex(ExprPtr) may not work without dereferencing

It does - the change to the test cases happened at a time when I didn't have the overload for ExprPtr yet (and thought I might not need it). It was required in more places though, so I added the overload afterwards.

to_string/to_wstring renamed to toUtf8/toUtf16

That was intentional - staying consistent with the meaning of those functions in the std namespace, they are now only used for converting non-string-type objects into strings. All transformation on string types actually deal with encoding changes and hence that should be made implicit. In order to keep generic templates working, I added null-op overloads for the toUtf functions.

In fact, I would like to remove the remaining to_(w)string functions in favor of implementing support for std::format, which will allow for a more general (and hopefully also more consistent) framework for converting objects to strings. But that will be done in a separate PR.

{ label(obj) } -> std::constructible_fromstd::wstring inconsistent

That is due to the choice of making label return a string_view rather than a (const) reference. Views are not implicitly convertible to strings and hence I needed the more general requirement.

Actually, this is something that has been slightly bugging me for some time. Imo using std::string_view as a return value can be somewhat dangerous. Consider the following snippet:

auto name = tensor.label();

Looking at this, you might think that you have now an independent copy of the tensor's label. However, if label returns a view, you have a dangling reference as soon as tensor goes out of scope (which is not what you'd expect if you think you made a copy).

This is arguably not a big deal as it isn't frequently encountered. However, I can't help thinking that using const std::wstring & instead of std::wstring_view as a return type doesn't yield any downsides (and fixes the above problem) 🤔


how about defaulting target type T to ExprPtr to shorten deserialize and from_string to deserialize<> and from_string<>? These will appear in all tutorials/slides etc. so every few shaved chars will matter.

I was thinking about this as well. I ended up not defaulting this to ensure users are always aware that they can customize the expected type (which I plan to extend in v2 such that we could do from_string<Tensor> which would then also omit creation of a shared_ptr). If we default to ExprPtr, users will be able to completely omit template arguments as in deserialize("bla") at which point it is highly non-obvious that there is an interface for specifying the type…
However, the slide argument is indeed a good one and given that ExprPtr is what people will want in most cases, it doesn't seem too bad (and there are docs, after all). I'll add it 👍

@evaleev
Copy link
Member

evaleev commented Feb 6, 2026

This is just a 1-to-1 translation of the previous requirement. I'm happy to drop it though (latex::to_string will now always be available due to the presence of the _to_latex member function)

It feels like we should drop to_latex from the type requirement that I must be not thinking of something simple. We are probably indeed relying on to_latex to be available somewhere, I don't know of a way to check this directly ... So I'm hesitant to drop it actually. We can revise later.

It does - the change to the test cases happened at a time when I didn't have the overload for ExprPtr yet (and thought I might not need it). It was required in more places though, so I added the overload afterwards.

👍

In fact, I would like to remove the remaining to_(w)string functions in favor of implementing support for std::format, which will allow for a more general (and hopefully also more consistent) framework for converting objects to strings. But that will be done in a separate PR.

👍

Actually, this is something that has been slightly bugging me for some time. Imo using std::string_view as a return value can be somewhat dangerous. Consider the following snippet:

auto name = tensor.label();

Looking at this, you might think that you have now an independent copy of the tensor's label. However, if label returns a view, you have a dangling reference as soon as tensor goes out of scope (which is not what you'd expect if you think you made a copy).

Yes, sure, views are cheap to copy but expensive to reason about :) copy-on-write strings are probably best.

This is arguably not a big deal as it isn't frequently encountered. However, I can't help thinking that using const std::wstring & instead of std::wstring_view as a return type doesn't yield any downsides (and fixes the above problem) 🤔

but even then:

const auto& name = tensor.label();
/// lots of code in which tensor is destroyed ...
std::cout << name; // ouch!

C++'s original sin is use of C defaults (=taking pointers is allowed by default). Value semantics is the way to go, but need optimizations for needless copies (move semantics is not panacea).

type… However, the slide argument is indeed a good one and given that ExprPtr is what people will want in most cases, it doesn't seem too bad (and there are docs, after all). I'll add it 👍

OK, please do it.

@Krzmbrzl Krzmbrzl merged commit 4dd403f into ValeevGroup:master Feb 7, 2026
17 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants