lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2336) off by one: DisjunctionSumScorer::advance
Date Sat, 20 Mar 2010 05:58:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847716#action_12847716
] 

Shai Erera commented on LUCENE-2336:
------------------------------------

Hi Gary

This has been discussed before (I'm not sure if about DisjunctionSumScorer specifically),
and therefore there is also a NOTE in advance() of DISI:
{code}
   * <b>NOTE:</b> certain implementations may return a different value (each
   * time) if called several times in a row with the same target.
{code}
Note the *may return a different value...* part. I remember while working on LUCENE-1614 that
this has been discussed and thus we ended up w/ documenting that *may return* part. See here:
https://issues.apache.org/jira/browse/LUCENE-1614?focusedCommentId=12710860&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12710860
and read some above and below to see relevant discussion.

I'll need to refresh my memory though why DisjunctionSumScorer works like that ... perhaps
an overlook on my side from 1614, but perhaps there was a reason.

Anyway, about the code example you gave above, why would you want to call advance w/ the same
value many times? What's the use case? If you're only dealing w/ one DISI, then unless you
really want to skip to a certain document, I don't see any reason for calling advance. The
usage is typically if you have 2 or more DISIs, and one's nextDoc or advance returned a value
that is greater than the other's doc() ...

Also, it's risky to write the code you wrote, because some scorers, upon init are already
on a certain doc (I think the Disj. ones, but maybe also the Conj. one), and so by calling
advance(1), you will actually *skip* over the first document and miss a hit.

Can you clarify the usage then?

> off by one: DisjunctionSumScorer::advance
> -----------------------------------------
>
>                 Key: LUCENE-2336
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2336
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>            Reporter: Gary Yngve
>            Priority: Minor
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> The bug is:
>     if (target <= currentDoc) {
> should be
>     if (target < currentDoc) {
> based on the comments for the method as well as the contract for DocIdSetIterator: "Advances
to the first beyond the current"
> It can be demonstrated by:
>  		assertEquals("advance(1) first match failed", 1, scorer.advance(1));
>  		assertEquals("advance(1) second match failed", n, scorer.advance(1));
> if docId: 1 is a hit and n is the next hit.  (Tests all pass if this code change is made.)
> I'm not labeling it as major because the class is package-protected and currently passes
spec.
> Relevant excerpt:
>  /**
>    * Advances to the first match beyond the current whose document number is
>    * greater than or equal to a given target. <br>
>    * When this method is used the {@link #explain(int)} method should not be
>    * used. <br>
>    * The implementation uses the skipTo() method on the subscorers.
>    * 
>    * @param target
>    *          The target document number.
>    * @return the document whose number is greater than or equal to the given
>    *         target, or -1 if none exist.
>    */
>   public int advance(int target) throws IOException {
>     if (scorerDocQueue.size() < minimumNrMatchers) {
>       return currentDoc = NO_MORE_DOCS;
>     }
>     if (target <= currentDoc) {
>       return currentDoc;
>     }
>     do {
>       if (scorerDocQueue.topDoc() >= target) {
>         boolean b = advanceAfterCurrent();
>         return b ? currentDoc : (currentDoc = NO_MORE_DOCS);
>       } else if (!scorerDocQueue.topSkipToAndAdjustElsePop(target)) {
>         if (scorerDocQueue.size() < minimumNrMatchers) {
>           return currentDoc = NO_MORE_DOCS;
>         }
>       }
>     } while (true);
>   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message