lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gary Yngve (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2336) off by one: DisjunctionSumScorer::advance
Date Sat, 20 Mar 2010 17:47:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847797#action_12847797
] 

Gary Yngve commented on LUCENE-2336:
------------------------------------

I don't have a specific use case.  Long story short, I am toying with a BM25 implementation
via subclassing a slightly refactored DisjunctionSumScorer, as the published BM25 research
implementation doesn't work w/ phrases and is too smelly for my tastes.  I'm attempting to
understand how everything works deep in lucene by reading the code and playing with it (maybe
I should try to read choice checkin logs too) and discovered this inconsistency in a unit
test that I was playing with.

The advance docs for DISI state that it behaves as written:

 int advance(int target) {
   int doc;
   while ((doc = nextDoc()) < target) {
   }
   return doc;
 }
 
This to me implies that advance(0) should be equivalent to calling nextDoc() in all cases.
(And this is how advance behaves in TermScorer and PhraseScorer.)

However the real behavior for DSS is more like:

while (docID() < target) {
  if (!next()) return -1;
}
return docID();

On the other hand, the thread you cite states:

> > To me, calling skipTo or advance with the same target multiple times and get different
result every time is weird. I'd like to change that semantic

> But I thought we had just agreed that skipTo(doc) is well defined only for doc>current?
(see "bigger question" point above).
> If we all agree that , then we don't need to worry about skipTo(10) being called twice
in a row.

So it sounds like I'd be reasonably happy with a resolution of documenting advance in DISI
with:
"The behavior of calling advance with a target equal to docID() is undefined,"

and happier with a convincing argument of why this situation never arises within any Lucene
scorers in real life.

-Gary

> off by one: DisjunctionSumScorer::advance
> -----------------------------------------
>
>                 Key: LUCENE-2336
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2336
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>            Reporter: Gary Yngve
>            Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The bug is:
>     if (target <= currentDoc) {
> should be
>     if (target < currentDoc) {
> based on the comments for the method as well as the contract for DocIdSetIterator: "Advances
to the first beyond the current"
> It can be demonstrated by:
>  		assertEquals("advance(1) first match failed", 1, scorer.advance(1));
>  		assertEquals("advance(1) second match failed", n, scorer.advance(1));
> if docId: 1 is a hit and n is the next hit.  (Tests all pass if this code change is made.)
> I'm not labeling it as major because the class is package-protected and currently passes
spec.
> Relevant excerpt:
>  /**
>    * Advances to the first match beyond the current whose document number is
>    * greater than or equal to a given target. <br>
>    * When this method is used the {@link #explain(int)} method should not be
>    * used. <br>
>    * The implementation uses the skipTo() method on the subscorers.
>    * 
>    * @param target
>    *          The target document number.
>    * @return the document whose number is greater than or equal to the given
>    *         target, or -1 if none exist.
>    */
>   public int advance(int target) throws IOException {
>     if (scorerDocQueue.size() < minimumNrMatchers) {
>       return currentDoc = NO_MORE_DOCS;
>     }
>     if (target <= currentDoc) {
>       return currentDoc;
>     }
>     do {
>       if (scorerDocQueue.topDoc() >= target) {
>         boolean b = advanceAfterCurrent();
>         return b ? currentDoc : (currentDoc = NO_MORE_DOCS);
>       } else if (!scorerDocQueue.topSkipToAndAdjustElsePop(target)) {
>         if (scorerDocQueue.size() < minimumNrMatchers) {
>           return currentDoc = NO_MORE_DOCS;
>         }
>       }
>     } while (true);
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message