-
Notifications
You must be signed in to change notification settings - Fork 626
Implement swarm testing for st.one_of
#4322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| RELEASE_TYPE: patch | ||
|
|
||
| |st.one_of| now chooses a subset of its strategies to disable each time it generates a value. For example, it was previously unlikely that ``st.lists(st.integers() | st.floats() | st.text()`` would generate a long list containing only string values. This is now more likely, along with other uncommon combinations. | ||
|
|
||
| This technique is called `swarm testing <https://users.cs.utah.edu/~regehr/papers/swarm12.pdf>`__, and can considerably improve bug-finding power, for instance because some features actively prevent other interesting behavior from running. See :issue:`2643` for more details. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,6 +28,7 @@ | |
| overload, | ||
| ) | ||
|
|
||
| from hypothesis import strategies as st | ||
| from hypothesis._settings import HealthCheck, Phase, Verbosity, settings | ||
| from hypothesis.control import _current_build_context, current_build_context | ||
| from hypothesis.errors import ( | ||
|
|
@@ -689,10 +690,16 @@ class OneOfStrategy(SearchStrategy[Ex]): | |
| """ | ||
|
|
||
| def __init__(self, strategies: Sequence[SearchStrategy[Ex]]): | ||
| from hypothesis.strategies._internal.featureflags import FeatureStrategy | ||
|
|
||
| super().__init__() | ||
| self.original_strategies = tuple(strategies) | ||
| self.__element_strategies: Optional[Sequence[SearchStrategy[Ex]]] = None | ||
| self.__in_branches = False | ||
| self.enabled_branches_strategy = st.shared( | ||
| FeatureStrategy(self.original_strategies), | ||
| key=("one_of swarm testing", self.original_strategies), | ||
| ) | ||
|
|
||
|
Comment on lines
+699
to
703
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that using st.shared for the FeatureStrategy means that any separately-defined strategies which happen to have the same key will have the same swarm behavior. i.e. two separate usages of st.integers() | st.floats() will use the same swarm decision of what to disable. But I believe they will still draw separate random choices below the swarm step, so the values won't be the same, just the disabling decision. I think diversity of whether to share swarm decisions here would be best for bug finding. What we might be able to do is draw a meta p which chooses, for each occurrence of the same-keyed strategy
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds complicated; let's leave that for a follow-up PR. |
||
| def calc_is_empty(self, recur: RecurT) -> bool: | ||
| return all(recur(e) for e in self.original_strategies) | ||
|
|
@@ -739,9 +746,10 @@ def calc_label(self) -> int: | |
| ) | ||
|
|
||
| def do_draw(self, data: ConjectureData) -> Ex: | ||
| feature_flags = data.draw(self.enabled_branches_strategy) | ||
| strategy = data.draw( | ||
| SampledFromStrategy(self.element_strategies).filter( | ||
| lambda s: s.available(data) | ||
| lambda s: s.available(data) and feature_flags.is_enabled(s) | ||
| ) | ||
| ) | ||
| return data.draw(strategy) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment; not too sure what to do here. Here's one example of this added redundancy:
which prints "A" once on master but ~8 times on this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
self.__p_disabledis only ever used as thep=argument todraw_bool(), and is strictly between 0 and 1, do we need to choose it via the choice sequence at all? vs.data.random(), or0.5if that's not available?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. If we don't draw this from the choice sequence,
DataTreethinks the lateris_disableddraw is flaky because the kwargsppassed todraw_booleanhas changed. It's not actually flaky because the values in the choice sequence don't depend on p (as long as we're careful about p=0.0 and p=1), butDataTreedoesn't know that.We could imagine some kind of "ignore this param for flakiness" api, but I'm not sure how that would look.
Alternatively, hardcode that
pindraw_booleanshouldn't be checked for flakiness? It's controlling the distribution, not the domain, so arguably consumers should be able to change it without flakiness...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ignoring distribution-only arguments in flakiness-checks makes sense; we can still replay perfectly.