lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Merging ordered segments without re-sorting.
Date Wed, 23 Oct 2013 21:45:25 GMT
Hi,

On Wed, Oct 23, 2013 at 10:19 PM, Arvind Kalyan <base16@gmail.com> wrote:
> Sorting is not an option for our case so we will most likely implement a
> variant that merges the segments in one pass. Using TimSort is great but in
> our case the 2 segments will be highly interspersed and would not benefit
> from the galloping in TimSort.

I think Shai didn't mention TimSort because of galloping, but because
it tries to discover the monotonic sequences in your data. In your
particular case, this means that the sorting would run in linear time
instead of O(n log(n)) given that the 2 segments to merge are sorted.

> In additional, if anyone else on the list has any inputs on implementing
> the merge (without sort) I'd appreciate it as well! More than likely I'll
> have followup questions if we decide to go this route.

The tricky part is postings merging. You need to be able to remap your
doc ids to their new value in the merged segment and this requires
sorting the doc ids.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message