lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: lucene 2.9 sorting algorithm
Date Tue, 20 Oct 2009 12:49:25 GMT
> On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller <>
> wrote:
> > Hmm - perhaps I'm not remembering right. Or perhaps we had different
> > motivations ;) I never did anything in 1483 based on search perf - and I
> > took your tests as testing that we didn't lose perf, not that we gained
> > any. The fact that there were some wins was just a nice surprise from my
> > perspective.
> >
> > A quote from you in that issue:
> >
> > "I didn't expect such performance gain (I was hoping for not much
> > performance loss, actually). I think it may be that although the
> > initial value copy adds some cost, the within-queue comparsions are
> > then faster because you don't have to deref back to the fieldcache
> > array. It seems we keep accidentally discovering performance gains
> > here"
> >
> > My whole memory of that issue is that we didn't do anything for
> > performance gains. We just happened to measure a few. It was just to get
> > to per segment. Was a long time ago though.
> Right, our original motitivation was fast reopen time, by doing
> searching (and collection) per-segment so that field cache only used
> at the segment level.
> But, that required cutting over field sorting, which was tricky.
> Our first go at it was the multi PQ approach (copying MultiSearcher),
> but I believe that showed poor performance.  I remember being
> depressed about it :)  So that poor performance pushed us to work out
> the new comparator API that use a single PQ, and, after much
> iterating, we saw better performance net/net.

And the new sorting API is in line with the new Collector API! You have a
setNextReader() method, where you e.g. load the FieldCache for the next
segment and provide the compare functions, also you can get the scorer. My
question: What is so hard to use this API? OK its more work when
implementing the Comparator, but it is more intuitive for me if you think in
terms of per-segment searches. For new users only the bottom comparison and
so on is strange, the other is straightforward.

I do not know how this flexibility can be implemented with the old API
(scorer, reader switch)? If we want to switch back to a more simplier API,
we should not switch back to the strange old one (I never understood it
completely, the new one I understand)! Maybe we can provide an easy-to-use
default implementation for Comparables in addition to custom sort, which may
help lots of people that used Comparables with the old API. This impl may be
slower and more memory intensive than directly implementing the new API, but
may help.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message