lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Tue, 20 Oct 2009 12:55:15 GMT
Uwe Schindler wrote:
>> On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller <markrmiller@gmail.com>
>> wrote:
>>     
>>> Hmm - perhaps I'm not remembering right. Or perhaps we had different
>>> motivations ;) I never did anything in 1483 based on search perf - and I
>>> took your tests as testing that we didn't lose perf, not that we gained
>>> any. The fact that there were some wins was just a nice surprise from my
>>> perspective.
>>>
>>> A quote from you in that issue:
>>>
>>> "I didn't expect such performance gain (I was hoping for not much
>>> performance loss, actually). I think it may be that although the
>>> initial value copy adds some cost, the within-queue comparsions are
>>> then faster because you don't have to deref back to the fieldcache
>>> array. It seems we keep accidentally discovering performance gains
>>> here"
>>>
>>> My whole memory of that issue is that we didn't do anything for
>>> performance gains. We just happened to measure a few. It was just to get
>>> to per segment. Was a long time ago though.
>>>       
>> Right, our original motitivation was fast reopen time, by doing
>> searching (and collection) per-segment so that field cache only used
>> at the segment level.
>>
>> But, that required cutting over field sorting, which was tricky.
>>
>> Our first go at it was the multi PQ approach (copying MultiSearcher),
>> but I believe that showed poor performance.  I remember being
>> depressed about it :)  So that poor performance pushed us to work out
>> the new comparator API that use a single PQ, and, after much
>> iterating, we saw better performance net/net.
>>     
>
> And the new sorting API is in line with the new Collector API! You have a
> setNextReader() method, where you e.g. load the FieldCache for the next
> segment and provide the compare functions, also you can get the scorer. My
> question: What is so hard to use this API? OK its more work when
> implementing the Comparator, but it is more intuitive for me if you think in
> terms of per-segment searches. For new users only the bottom comparison and
> so on is strange, the other is straightforward.
>
> I do not know how this flexibility can be implemented with the old API
> (scorer, reader switch)? If we want to switch back to a more simplier API,
> we should not switch back to the strange old one (I never understood it
> completely, the new one I understand)! Maybe we can provide an easy-to-use
> default implementation for Comparables in addition to custom sort, which may
> help lots of people that used Comparables with the old API. This impl may be
> slower and more memory intensive than directly implementing the new API, but
> may help.
>
> Uwe
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>   
That wording had me caught up to Uwe - "switch back to the old API" -
brings a lot of baggage with it :) More accurately, they mean switch to
using multiple p-queues rather than a single p-queue. The switch to
using a single p-queue is why we had to bring in setNextReader and all
of that to begin with. The original approach was actually to work as
MultiSearcher works and do a merge after using a p-queue for each
segment. So if we went back to that, many of the "new apis" wouldn't be
needed anymore.

I agree that the new API is not too dreadful, but I think part of that
may be from being a more advanced user. Lets just not go back to caching
comparators :)

-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message