lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <>
Subject Re: lucene 2.9 sorting algorithm
Date Fri, 23 Oct 2009 05:08:20 GMT
Hi Yonik:
    I have been head deep in this trying to find out a good solution for
better part of the past two days, it's been hard because there are so many

1) how optimized are the code from either of the implementations
2) VM difference
3) HW etc.

    Also, there are quite a few dimensions this issue is being discussed on:


    I think we should NOT jump to the conclusion that my number on the
multiQ is valid until others reproduce it (which is one of the reason I
asked mike to run his benchmark again with 1.5) I am gonna try to run it on
server machines when I get back to my office next week.

    Overall, I think the single Q algorithm is better. (It however does pay
a price for some string compares etc.), Its benefit becomes more and more
significant when the product of PQ size and segment count increases, which
makes complete sense from the algorithm. However, when PQ size is small
(which is in most of the cases, the multiplier on the segment count is also
small) the benefit is not as obvious. And sometimes the trade-off for the
constant string compare cost may not be worth it. (this remains a

    With Java 1.6, maybe the singleQ approach is a winner in all cases.

     I will spend more time to find out a more definitive answer.


    The new FieldComparator API is not difficult to understand (especially
for Lucene experts such as yourselves), but it is more involved in
comparison to the ScoreDocComparator API. I think anyone would agree with
that. Furthermore, when implementing some custom comparators, (examples I
have given earlier in this thread), it can be difficult to implement while
maintaining performance.

    I understand changing API is hard, that is why I am trying to raise this
as soon as possible, and it could very well be that the current API is fine.

    Lucene's collector api allows anyone to plugin any sorting algorithm,
kinda like what Mike has done with the tests. So it is ok if an API selected
does not fit the needs for everyone.

    In conclusion, please understand I am not trying to be "right" on this,
just trying to learn and to understand, which I did from reading and trying
to understand the code, along with guidance from Mike and Yonik and I am
more than impressed with the thoughts and code tuning that went into it.



On Thu, Oct 22, 2009 at 8:30 PM, Yonik Seeley <>wrote:

> On Thu, Oct 22, 2009 at 11:11 PM, Jake Mannix <>
> wrote:
> > It's hard to read the column format, but if you look up above in the
> thread
> > from tonight,
> > you can see that yes, for PQ sizes less than 100 elements, multiPQ is
> > better, and only
> > starts to be worse at around 100 for strings, and 50 for ints.
> Ah, OK, I had missed John's followup with the numbers.
> I assume this is for Java5 + optimizations?
> What does Java6 show?
> bq. Point 2 does indeed make a difference, we've seen it, and it's
> only fair: the single pq comparator does this branch optimization but
> the current patch multi-pq does not, so let's level the playing field.
> Of course - it's not about leveling the playing field, but finding the
> best solution for the average case - so everything should be optimized
> as much as possible.  There are probably further optimizations
> possible in both the single and multi PQ cases.
> My biggest reservation is that we've gone down the road of telling
> people to implement a new style of comparators, and told them that the
> old style comparators would be deleted in the next release (which is
> where we are).  Reversing that will be a bit of a headache/question...
> the new stuff isn't deprecated, and having *both* isn't desirable, but
> that's a separate decision to be made apart from performance testing.
> Is there also an option of using a multiPQ approach with the new style
> comparators?
> -Yonik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message