lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Tue, 19 May 2009 19:57:45 GMT


Michael McCandless commented on LUCENE-1614:

> This would save CPU for scorers that merge multiple sub-scorers (like BooleanScorer/2),
because instead of having to check for -1 returned from each sub-scorer, they could simply
proceed with their normal logic and check for Integer.MAX_VALUE just before collecting the

But for scorers that use a priority queue, does checking and immediately removing from the
queue (hence making the heap smaller) offer any advantages? I had assumed so since this is
what current scorers do. Immediately removing scorers also causes early termination for minimumNrMatchers>1
in DisjunctionSumScorer.

But that only helps at the tail end of the iteration, vs saving an if
check per-sub-scorer X per-next?

Ie presumably much more CPU is spent iterating while the PQ is full,
than while it's winding down, so saving the if per-sub-scorer-next is

Also, I think over time we should migrate away from the PQ (ie, use
BooleanScorer's batch approach, not Disjunction*Scorer's PQ) since the
batch scoring approach gives better performance.  EG I think we should
extend BooleanScorer to handle MUST clauses.  BooleanScorer handles
doc=Integer.MAX_VALUE for a sub-scorer quite efficiently (the chunk is
always skipped for that sub-scorer, after one if check).

> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>                 Key: LUCENE-1614
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>         Attachments: LUCENE-1614.patch
> See
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message