-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Correct TermOrdValComparator competitive iterator intoBitSet implementation #14523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct TermOrdValComparator competitive iterator intoBitSet implementation #14523
Conversation
…re similar to other impls
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java
Outdated
Show resolved
Hide resolved
@@ -434,6 +434,9 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws IOExcept | |||
if (competitiveIterator.docID() < doc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is strange. All other intoBitSet
impls that I can find call the competitiveIterator to advance to at least offset
, not to some internal doc
. I suppose NumericComparator is unique?
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
LeafCollector leafCollector = collector.getLeafCollector(context); | ||
BulkScorer bulkScorer = weight.bulkScorer(context); | ||
// We need to split on this specific doc ID so that the current doc of the competitive | ||
// iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe format this single-line word better?
…tation (#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: #14517
…tation (#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: #14517
if (upTo <= doc) { | ||
return; | ||
} | ||
// Optimize the case when intersecting the competitive iterator is expensive, which is when it | ||
// hasn't nailed down a disjunction of competitive terms yet. | ||
if (disjunction == null) { | ||
if (docsWithField != null) { | ||
// we need to be absolutely sure that the iterator is at least at offset | ||
if (docsWithField.docID() < offset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract from DocIdSetIterator#intoBitSet
requires starting from the current doc rather than the first doc on or after offset
, so it would be more correct to advance to doc
here. In practice, it wouldn't make a difference today since all call sites first advance iterators to offset
before calling intoBitSet
, but it may not always be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contract from DocIdSetIterator#intoBitSet requires starting from the current doc rather than the first doc on or after offset, so it would be more correct to advance to doc here.
So things might got broken when current doc is after the first doc like
CompetitiveIterator competitiveIterator = CompetitiveIterator.of(1, 2, 3, 4);
competitiveIterator.advance(2); // advance current doc to 2
competitiveIterator.update(0, 2000); // update to doc value
final int offset = 1;
if (competitiveIterator.docID() < offset) {
competitiveIterator.advance(offset);
}
competitiveIterator.intoBitSet(upTo, bitset, offset); // we should start from 2 instead of 1.
? Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
…tation (apache#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: apache#14517
Similar to @ChrisHegarty 's change for the count fix, this will move up the
max
check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases.Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet.
closes: #14517