lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: SortingAtomicReader alternate to Tim-Sort...
Date Wed, 06 May 2015 21:04:12 GMT
Sorry for the delay, I opened
https://issues.apache.org/jira/browse/LUCENE-6469. It can go to trunk
and 5.x (the value of x depending on when it's ready :)).

On Thu, Apr 30, 2015 at 9:02 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
>>
>> Would you like to submit a patch that changes SortingMergePolicy to
>> use the approach that you are proposing using bitsets instead of
>> sorting int[] arrays?
>
>
> Sure can do that. Can you open a ticket for this, as I don't know what
> versions this can go in?
>
> --
> Ravi
>
>
>
> On Tue, Apr 28, 2015 at 6:03 PM, Adrien Grand <jpountz@gmail.com> wrote:
>
>> On Tue, Apr 21, 2015 at 10:00 AM, Ravikumar Govindarajan
>> <ravikumar.govindarajan@gmail.com> wrote:
>> > Thanks for the comments…
>> >
>> > My only
>> >> concern about using the FixedBitSet is that it would make sorting each
>> >> postings list run in O(maxDoc) but maybe we can make it better by
>> >> using SparseFixedBitSet
>> >
>> >
>> > Yes I was also thinking about this. But we are on 4.x and did not take
>> the
>> > plunge. But as you said, it should be a good idea to test on SFBS
>>
>> Would you like to submit a patch that changes SortingMergePolicy to
>> use the approach that you are proposing using bitsets instead of
>> sorting int[] arrays?
>>
>> > I'm curious if you already performed any kind of benchmarking of this
>> >> approach?
>> >
>> >
>> > Yes we did a stress test of sorts aimed at SortingMergePolicy. We made
>> most
>> > of our data as RAM resident and then CPU hot-spots came up...
>> >
>> > There were few take-aways from the test. I am listing down few of them..
>> > It's kind of lengthy. Please read through...
>> >
>> > a) Postings-List issue, as discussed above…
>> >
>> > b) CompressingStoredFieldsReader did not store the last decoded 32KB
>> chunk.
>> > Our segments are already sorted before participating in a merge. On
>> mostly
>> > linear merge, we ended up decoding the same chunk again and again. Simply
>> > storing the last chunk resulted in good speed-ups for us...
>> >
>> > c) Once above steps were corrected, the CPU hotspot shifted to
>> > BlockDocsEnum. Here most of our postings-list < 128 docs. So
>> > Lucene41Postings started using vInts…  I did try ForUtil encoding even
>> for
>> > < 128 docs. It definitely went easy on CPU. But failed to measure
>> resulting
>> > file-size increase.
>> >
>> > I realised not just SMP but any other merge must face the same issue and
>> > left it at that..
>>
>> True. Like Robert said, there has been work done on b) already and I
>> think we can move forward on a) too. Thanks for sharing your findings!
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>



-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message