lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Wed, 20 May 2009 10:07:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711082#action_12711082
] 

Michael McCandless commented on LUCENE-1614:
--------------------------------------------

I'm not referring to BS/2's exposure of advance/nextDoc to their
callers; I'm talking about how BS/2 invoke advance/nextDoc on their
sub-scorers.

So, in BooleanScorer, starting on line 239 (w/ your patch), is this:
{code}
int doc = sub.done ? -1 : scorer.doc();
while (!sub.done && doc < end) {
  sub.collector.collect(doc);
  doc = scorer.nextDoc();
  sub.done = doc < 0;
}
{code}
this is the hotspot for BooleanScorer: it's advancing through a chunk
(of 2048 docs) at once, for that one sub-scorer.  Note the checks &
assignments to sub.done that are required....

If instead we switch to Integer.MAX_VALUE as the sentinel, that loop
is simplified to this, instead:

{code}
int doc = scorer.doc();
while (doc < end) {
  sub.collector.collect(doc);
  doc = scorer.nextDoc();
}
{code}

scorer.done is no longer computed nor checked.  What makes this
possible is the existing "doc < end" check can be "reused" since the
sentinel moves "forwards", not backwards.

With this, we'd also need to change the toplevel collection (starting
on line 150) to stop processing once all sub-scorers have advanced to
the sentinel, but this is 1 added if per 2048 docs so the added cost
(vs savings of not computing/checking scorer.done) is tiny.

In ConjunctionScorer's doNext method (its hotspot), it currently does this:
{code}
while (more && (firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc()))
{
  more = firstScorer.advance(lastDoc) >= 0;
  lastScorer = firstScorer;
  first = (first == (scorers.length-1)) ? 0 : first+1;
}
return more;
{code}

with the sentinel change, it would do this:

{code}
while ((firstScorer=scorers[first]).doc() < (lastDoc=lastScorer.doc())) {
  firstScorer.advance(lastDoc);
  lastScorer = firstScorer;
  first = (first == (scorers.length-1)) ? 0 : first+1;
}
return lastDoc != DOC_SENTINEL;
{code}

ie, we no longer assign to, nor check, the more boolean.  What makes
this possible is we know all sub-scorers will advance to the sentinel,
so we know that sentinel doc will pass the existing checks in the
while loop.

bq. Also, given Marvin's response above, using 0 as sentinel is no different than using -1
in terms of "suddenly moving backwards".

I agree that Marvin's choice of 0 is also "suddenly moving backwards",
but it still seems to me to be a poor choice since it costs our
BooleanScorers more CPU in their hotspots.


> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message