lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: lucene 2.9 sorting algorithm
Date Thu, 15 Oct 2009 15:53:54 GMT
On Thu, Oct 15, 2009 at 4:31 AM, Jake Mannix <> wrote:
>> Conversion from one segment to another is only
>> done as needed... only the bottom slot is converted automatically when
>> the segment is switched.
> That's not what it looks like, actually: you convert the bottom slot, and
> as soon as you get another hit from the new segment which is
> competitive, it pushes what was the old bottom out, and there's a new
> bottom, which often will be from one of the previous segments, in which
> case *it* must be converted to the new ordering as well.

That's exactly what I said... on a segment switch you *only* convert
the bottom slot.  Other conversions are done "as needed" when you get
competitive hits.

>> Filling a complete priority queue for each segment is also more
>> expensive (from the big-O perspective).
> Huh?  If you have M roughly balanced segments of N docs each, and
> you want to find the top K (out of all M*N) sorted docs

I meant on a much simpler level (big-O doesn't normally consider
constants, that's what I was trying to get across).
A single priority queue means fewer priority queue insertions (with
the inserts being more expensive in general).

And it seems like a PQ per segment simply delays many of the slow
lookups until the end where the PQs must be merged.
And as Mike points out, you throw in things like branch prediction and
everything else, and one really just needs to try it out to know for

> After all - the multiple queues approach is what
> lucene has always done for the multi-reader case

Lucene has always used a single PQ for the collector for MultiReader.
Previous to 2.9, the merge was done at a lower level in the
TermEnum/TermDocs implementation, and the IndexSearcher code was
oblivious whether it was searching a SegmentReader or a MuiltiReader


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message