Skip to content

Conversation

@Liam-DeVoe
Copy link
Member

Part of #2643. I think swarm testing is great and would love to get it into more places in Hypothesis. There are still some problems with this implementation (see review comments).

Comment on lines +65 to 71
# this really messes up our deduplication tracking, because all 255
# draws are unique. But we more or less have to choose whether something
# is enabled on-demand with a prior probability, rather than choosing what
# is enabled up front, because the latter results in a very large choice
# sequence when there are lots of possibilities.
# (a tradeoff might be selecting up front when there are <= 3 options?)
self.__p_disabled = self.__data.draw_integer(0, 254) / 255
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment; not too sure what to do here. Here's one example of this added redundancy:

@given(st.from_regex(re.compile("[ab]*", re.IGNORECASE), fullmatch=True))
def f(v):
    print(v)
f()

which prints "A" once on master but ~8 times on this branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.__p_disabled is only ever used as the p= argument to draw_bool(), and is strictly between 0 and 1, do we need to choose it via the choice sequence at all? vs. data.random(), or 0.5 if that's not available?

Copy link
Member Author

@Liam-DeVoe Liam-DeVoe Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. If we don't draw this from the choice sequence, DataTree thinks the later is_disabled draw is flaky because the kwargs p passed to draw_boolean has changed. It's not actually flaky because the values in the choice sequence don't depend on p (as long as we're careful about p=0.0 and p=1), but DataTree doesn't know that.

We could imagine some kind of "ignore this param for flakiness" api, but I'm not sure how that would look.

Alternatively, hardcode that p in draw_boolean shouldn't be checked for flakiness? It's controlling the distribution, not the domain, so arguably consumers should be able to change it without flakiness...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignoring distribution-only arguments in flakiness-checks makes sense; we can still replay perfectly.

Comment on lines +699 to 703
self.enabled_branches_strategy = st.shared(
FeatureStrategy(self.original_strategies),
key=("one_of swarm testing", self.original_strategies),
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using st.shared for the FeatureStrategy means that any separately-defined strategies which happen to have the same key will have the same swarm behavior. i.e. two separate usages of st.integers() | st.floats() will use the same swarm decision of what to disable. But I believe they will still draw separate random choices below the swarm step, so the values won't be the same, just the disabling decision.

I think diversity of whether to share swarm decisions here would be best for bug finding. What we might be able to do is draw a meta p which chooses, for each occurrence of the same-keyed strategy s, whether to reuse swarm behavior or not. But I don't know if this would be easy to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds complicated; let's leave that for a follow-up PR.

@Liam-DeVoe
Copy link
Member Author

This is a core change to our generation logic - which we also just changed in #4356. Since I'm not confident this PR will be regression free (in either performance or bugfinding ability), I'm tempted to defer this PR for a bit to let any issues with the constants pool shake out first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants