Skip to content

Ensure that FilterXxx properly delegate to the decorated instances#15469

Merged
dweiss merged 8 commits intoapache:mainfrom
blerer:15452
Jan 11, 2026
Merged

Ensure that FilterXxx properly delegate to the decorated instances#15469
dweiss merged 8 commits intoapache:mainfrom
blerer:15452

Conversation

@blerer
Copy link
Copy Markdown
Contributor

@blerer blerer commented Dec 3, 2025

This is an initial version of the patch as I am waiting for the result of the discussion on the mailing list.

The patch introduce a new assertion method in LuceneTestCase that verify that a given Decorator class (FilterXxx) properly delegate all methods calls to the wrapped instance. The patch add the missing delegation methods.

Closes #15452

Description

@github-actions
Copy link
Copy Markdown
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions Bot added the Stale label Dec 18, 2025
The patch introduce a new assertion method in LuceneTestCase that
verify that a given base decorator class (Filter) properly delegate
all methods calls to the wrapped instance. The patch add the
missing delegation methods.

Closes apache#15452
Copy link
Copy Markdown
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it very much. Especially the tests that get simplified. I've noticed a few things that seem odd to me but I'll need to understand what's going on better. Please give me some time to dive deeper into it. Looks great though.

@github-actions github-actions Bot removed the Stale label Jan 9, 2026
Copy link
Copy Markdown
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it really much - I think it makes those Filter* classes truly more transparent to use. I would really like to get down to why those few methods have to be added so that tests can pass. It seems like it should be the other way around - we should fix tests if they're relying on incorrect workings of previous filter implementations instead of patching those implementations in the code.

I'll try to return to it, time permitting, but it'd be good if somebody with more knowledge of the scoring system looked into it.

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 9, 2026

LGTM. It would be good to run luceneutil benchmarks to see if we're not adding any regressions here, would you have the time to run them, @blerer ?

@blerer
Copy link
Copy Markdown
Contributor Author

blerer commented Jan 9, 2026

I will try to find some time for it next week. Thanks for the review.

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 9, 2026

No, thank you for doing the heavy lifting! It really is a nice cleanup.

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 11, 2026

I've ran luceneutil benchmarks locally but I don't see much deviation other than noise. I'll merge it in.

@dweiss dweiss merged commit 6af9e96 into apache:main Jan 11, 2026
12 checks passed
@romseygeek
Copy link
Copy Markdown
Contributor

We've seen some CI failures after this merged - reproduce with ./gradlew :lucene:core:test --tests "org.apache.lucene.index.TestFilterLeafReader.testFilterIndexReader" -Ptests.asserts=true -Ptests.file.encoding=UTF-8 -Ptests.gui=true -Ptests.jvmargs= -Ptests.jvms=12 -Ptests.seed=CE6E3124C37548F0 -Ptests.vectorsize=default

The root of the problem is DocIdSetIterator.intoBitSet(), which is very closely tied to nextDoc() behaviour. If you override nextDoc() but not intoBitSet(), and the delegate behaviour doesn't use the default implementation, then you may get incorrect behaviour. This wasn't a problem before because intoBitSet() wasn't delegated.

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 12, 2026

I'll have to look into this to understand exactly what's happening but in general I think this patch reveals existing problems rather than causes them... It really is conceptually better if those filters delegate consistently to subclasses instead of partially reverting the default implementation.

Should we revert until this is sorted out?

@romseygeek
Copy link
Copy Markdown
Contributor

It's tricky, as delegation really does break if you have methods that depend on each other for default implementations.

Yes, let's revert for now and see if there's a better way of doing this.

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 12, 2026

if you have methods that depend on each other for default implementations.

This means the design is somehow screwed - if they depend on a particular implementation of a method in a superclass then either something is missing (an override modifying the behavior consistently in a subclass) or the contract is broken somewhere and makes those classes inconsistent...

I'll revert but I think it'd be good to understand and fix this.

dweiss added a commit that referenced this pull request Jan 12, 2026
@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 12, 2026

Ok, I see what you mean. I'll give it some thought.

dweiss added a commit to dweiss/lucene that referenced this pull request Jan 12, 2026
@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 12, 2026

Right. I've mentioned it before but the decorator pattern is basically always prone to braking something. The thing we missed here is indeed the strong relationship between nextDoc and intoBitSet.

Basically, whenever there is a class that wraps another one, it should provide these two methods consistently - either it knows how to implement them in the filter subclass or it shouldn't touch them at all. TestFilterLeafReader has a nice example where it implements this:

    private static class TestPositions extends FilterPostingsEnum {
      public TestPositions(PostingsEnum in) {
        super(in);
      }

      /** Scan for odd numbered documents. */
      @Override
      public int nextDoc() throws IOException {
        int doc;
        while ((doc = in.nextDoc()) != NO_MORE_DOCS) {
          if ((doc % 2) == 1) return doc;
        }
        return NO_MORE_DOCS;
      }
    }

This will clearly result in a different set of bits if intoBitSet stays delegated.

It is possible to automatically detect and enforce such tightly coupled overrides - I've written a quick example here using archunit -

afeda07

this only checks subclasses of FilterPostingsEnum but it immediately fails with the offender:

java.lang.AssertionError: Architecture Violation [Priority: MEDIUM] - Rule 'classes that are assignable to org.apache.lucene.index.FilterLeafReader$FilterPostingsEnum and are not interfaces should override methods consistently: [nextDoc, intoBitSet]' was violated (1 times):
Class org.apache.lucene.index.TestFilterLeafReader$TestReader$TestPositions (TestFilterLeafReader.java:0) must override the following methods consistently: [nextDoc, intoBitSet] but only overrides: [nextDoc]

	at __randomizedtesting.SeedInfo.seed([77710A3FDD5F4CD4:EB7D1F2EF63D9025]:0)
	at com.tngtech.archunit.lang.ArchRule$Assertions.assertNoViolation(ArchRule.java:94)
	at com.tngtech.archunit.lang.ArchRule$Assertions.check(ArchRule.java:86)
	at com.tngtech.archunit.lang.ArchRule$Factory$SimpleArchRule.check(ArchRule.java:165)
	at com.tngtech.archunit.lang.syntax.ObjectsShouldInternal.check(ObjectsShouldInternal.java:81)
	at org.apache.lucene.TestDesignConstraints.testFilterClasses(TestDesignConstraints.java:42)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)

There doesn't seem to be the right way to do it or at least I can't think of any.

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 12, 2026

There doesn't seem to be the right way to do it or at least I can't think of any.

I mean this in terms of the original problem of whether to delegate everything consistently or not. Both ways have their issues. Maybe we can cherry pick just parts of the patch or identify if this particular method pair is the only problem and write archunit tests around this? I don't know.

@blerer
Copy link
Copy Markdown
Contributor Author

blerer commented Jan 13, 2026

From an API point of view, I feel that intoBitSet() should never had been part of the DocIdSetIterator. It is method used to read data from the iterator into a FixedBitSet. It is somehow similar to adding an intoList method to the Iterator interface.
I suspect that the main reason behind its existence was to allow people to optimize its execution. Which seems to be confirmed by git history: Let DocIdSetIteratoroptimize loading into a FixedBitSet. (#14069)
Considering the amount of time this method is overridden, I do not think that we can say that this method was not supposed to be overridden. Therefore, if somebody decorate your implementation without delegating to its intoBitSet() the performance improvement will be lost silently.
Now considering the current state of the code it feels to me that it is probably safer to avoid delegating intoBitSet() as it will ignore optimizations but guarantee that the behavior is correct. What do you think?

@dweiss
Copy link
Copy Markdown
Contributor

dweiss commented Jan 13, 2026

I agree. The question remains if it's the only thing lurking there that can go wrong... I thought about just ignoring this particular method in delegation checks but had a reflection that maybe we're trying to fix a can of worms that should be left unopened... Please don't let it discourage you patches welcome! :)

@msfroh
Copy link
Copy Markdown
Contributor

msfroh commented Jan 13, 2026

The "existing" behavior (where some stuff is delegated but others are not) does (I think) a pretty good job of satisfying functional correctness while missing out on potential optimizations. The FilterWeight#count() case is also like that -- by not delegating, we just return -1, which forces us to compute the count the long way, even if the delegate could do it quickly.

It's still super trappy in terms of knowing what gets delegated and what doesn't, though.

Another pair of methods whose behavior depend on each other: Weight#scorerSupplier() and Weight#matches() are expected to be semantically the same. A FilterWeight that skips some documents from the wrapped scorerSupplier would need to either override matches to drop the same documents or would need to delegate to Weight#matches, which uses the scorerSupplier directly. In this case, we do delegate both methods but don't have a test that covers the case where only one is overridden.

finnroblin pushed a commit to finnroblin/lucene that referenced this pull request Feb 2, 2026
…pache#15469)

* Ensure that Filter classes properly delegate to the decorated instances

The patch introduce a new assertion method in LuceneTestCase that
verify that a given base decorator class (Filter) properly delegate
all methods calls to the wrapped instance. The patch add the
missing delegation methods.

Closes apache#15452

* Fix delegation from super.slice to in.slice in FilterIndexInput.

* Revert API change and delegate getName properly.

* Simplify the code slightly. Throw RuntimeException on clone() calls on FilterIndexInput subclasses as it's a mistake if they're cloned without the delegate also cloned.

* Add explicit comment where count cannot be approximated in filter subclasses.

* Add a clarification to BooleanScorerSupplier.advanceShallow.

---------

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
finnroblin pushed a commit to finnroblin/lucene that referenced this pull request Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FilterIndexInput and FilterIndexOutput override the behavior of the instance that they decorate

4 participants