lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: lucene 2.9 sorting algorithm
Date Thu, 15 Oct 2009 03:57:04 GMT
Interesting idea... though one further piece of info in the mix is
that large segments are typically processed first, and tend to fill up
the priority queue.  Conversion from one segment to another is only
done as needed... only the bottom slot is converted automatically when
the segment is switched.

Filling a complete priority queue for each segment is also more
expensive (from the big-O perspective).  An optimization would be to
keep track of the minimum needed to make it into the queue (a
short-circuit)... and that essentially would be what the current
implementation does today with the single queue.

-Yonik
http://www.lucidimagination.com


On Wed, Oct 14, 2009 at 11:12 PM, John Wang <john.wang@gmail.com> wrote:
> Hi guys:
>     Looking at the 2.9 sorting algorithm, and while trying to understand
> FieldComparator class, I was wondering about the following optimization: (I
> am using StringOrdValComparator as an example)
> Currently we have 1 instance of per segment data structure, e.g. (ords,vals
> etc.), and we keep track of the reader context using the readerGen array. We keep track of the top numHits elemnents by keeping a
> reference to the bottom element. To convert from readerContext1 to
> readerContext2, we need a mapping from value->ord, and the cost is a binary
> search (this is can be done N*numhits times). For some instances of custom
> sort extension, this lookup maybe expensive.
> How about instead, we keep a N instances of segment data structure, all
> compares within segment is done with ord compare. We never to across segment
> compare until the end. Where a merge is done on N instances using the
> Comparator. This way, we avoid the lookup from value->ord, and can keep the
> simpler API: ScoreDocComparator, which is much easier to extend for custom
> sorting. It is a higher memory cost, but it is an extra (N-1)*(numhits)
> times the data structure, where N is the number of segments, and is not too
> large.
> I am probably missing some intricacies with the current sorting algorithm.
> If so, please shine some light.
> Thanks
> -John
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message