lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1614) Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead of boolean
Date Tue, 19 May 2009 20:51:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710880#action_12710880
] 

Michael McCandless commented on LUCENE-1614:
--------------------------------------------

bq. About using Integer.MAX_VALUE as sentinel, did anyone consider what happens when the first
index actually reaches that number of documents?

Lucene already uses Integer.MAX_VALUE as a sentinel (eg the score(Collector) methods in Term/BooleanScorer/2),
so a Lucene index can already only contain Integer.MAX_VALUE docs.

bq. On moving from the priority queue (DisjunctionSumScorer/BooleanScorer2) to the batch approach
(BooleanScorer): I did not find a way to do that while scoring docs in docId order. 

What breaks if we allow docs to be collected out-of-order (besides external Hit/Collector)?
 As of LUCENE-1575, the core collectors can gain performance if they know the docs will be
collected in order, but they can also handle out-or-order collection just fine.

bq. The priority queue can be made faster by inlining (there is a patch for that, I can't
get to the issue number now), but that's about the limit as far as I can see.

I think PQ is fundamentally not very friendly to modern CPUs, because of the hard-to-predict
ifs; I think that's part of why the batch collection shows such gains.

This doesn't hurt us so much during hit collection, which also uses PQ, since the queue typically
quickly converges, but for OR scoring the PQ is intensely used the whole time.


> Add next() and skipTo() variants to DocIdSetIterator that return the current doc, instead
of boolean
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1614
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1614.patch
>
>
> See http://www.nabble.com/Another-possible-optimization---now-in-DocIdSetIterator-p23223319.html
for the full discussion. The basic idea is to add variants to those two methods that return
the current doc they are at, to save successive calls to doc(). If there are no more docs,
return -1. A summary of what was discussed so far:
> # Deprecate those two methods.
> # Add nextDoc() and skipToDoc(int) that return doc, with default impl in DISI (calls
next() and skipTo() respectively, and will be changed to abstract in 3.0).
> #* I actually would like to propose an alternative to the names: advance() and advance(int)
- the first advances by one, the second advances to target.
> # Wherever these are used, do something like '(doc = advance()) >= 0' instead of comparing
to -1 for improved performance.
> I will post a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message