Skip to content

Conversation

@mr-martian
Copy link
Contributor

fixes #44, I think.

@flammie @unhammer is there a better way to accomplish this?

@unhammer
Copy link
Member

unhammer commented Aug 4, 2025

Treating weighted epsilons as non-epsilons seems reasonable to me. I can't think of a better way. But there are a lot of other places in the code where epsilons are referred to so I can't say I feel very confident all cases are covered (for example linkStates seems to often be called like linkStates(state, initial, epsilon_tag, default_weight) while the function is defined as linkStates(int const source, int const target, int const tag, double const weight) and doesn't include this exception – are weighted epsilons irrelevant there?).

But if it passes tests and solves bugs we could just merge and see what happens :) Unless @flammie knows of something relevant in The Literature?

@flammie
Copy link
Member

flammie commented Aug 4, 2025

yeah I don't know the whole lttoolbox code so well but yeah weighted epsilons processing as non-epsilons is quite common, in openfst style if you wanna use non-weight aware algorithms for weighted automaton there's the weight encode- thing which just turns all labels into label+weight so non-zero weights make epsilons into a non-epsilon symbol too. For traversal as or such weighted epsilons just need to semiring + and - the weight traversing like any symbol, not sure if there's anything elegant written about it anywhere.

as a bottom line I agree, if it passes the tests we just have to test and see :-D

@unhammer
Copy link
Member

unhammer commented Aug 4, 2025

the weight encode- thing which just turns all labels into label+weight so non-zero weights make epsilons into a non-epsilon symbol too

So one alternative would be to do this in compilation – that might mean fewer places in the runtime code to check for correctness? Might have some effect on transducer size though.

@mr-martian
Copy link
Contributor Author

I suppose a more accurate name for this PR would be to specify that it only changes the behavior of determinize() since other operations don't remove states and thus don't need special casing. We should probably double check compose() as well, which I'm much less familiar with.

I would like to add pushWeights() at some point to minimize the impact of this change, but I've had trouble finding an explanation of the algorithm.

@unhammer
Copy link
Member

unhammer commented Aug 4, 2025

Oh hah I for some reason thought this was happening in fst_processor. Then it makes more sense.

@mr-martian mr-martian merged commit 6e03b9f into main Aug 4, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Weights are ignored in monolingual dictionary entries

4 participants