lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Fri, 23 Oct 2009 03:53:53 GMT
On Thu, Oct 22, 2009 at 8:30 PM, Yonik Seeley <yonik@lucidimagination.com>wrote:

> On Thu, Oct 22, 2009 at 11:11 PM, Jake Mannix <jake.mannix@gmail.com>
> wrote:
> > It's hard to read the column format, but if you look up above in the
> thread
> > from tonight,
> > you can see that yes, for PQ sizes less than 100 elements, multiPQ is
> > better, and only
> > starts to be worse at around 100 for strings, and 50 for ints.
>
> Ah, OK, I had missed John's followup with the numbers.
>
> I assume this is for Java5 + optimizations?
>

Yeah, this was for Java5 + optimizations.


> What does Java6 show?
>

Java6 on Mac showed close to what Mike posted in his report on the Jira
ticket -
that single-PQ performs a little better for small pq, and more like 30-40%
better
for large pq.


> My biggest reservation is that we've gone down the road of telling
> people to implement a new style of comparators, and told them that the
> old style comparators would be deleted in the next release (which is
> where we are).  Reversing that will be a bit of a headache/question...
> the new stuff isn't deprecated, and having *both* isn't desirable, but
> that's a separate decision to be made apart from performance testing.
>

Well the issue comes down to: if the performance is *basically comparable*
between the two approaches, then the new API is much harder for the
average user to use, and even for the experienced user, it's not terribly
fun,
and more importantly: for the user who has already implemented custom
sorts on the old API, upgrading is enough trouble that people may decide
it's not worth it.  It probably *is* worth it, but if you're going to even
put that
kind of thinking in the user's head, you've got to ask yourself: what's the
reasoning for going with a more complex API if you can get equal (slightly
better in some cases, slightly worse in others) performance with a simpler
API?

Yes, as Mike says, the new API is *not* breaking back-compat in a
functional sense, but how many users have converted to the new sorting
api already?  2.9 has barely just come out, and while it's work for the
community as a whole to reconsider the multi-segment sorting api, and
work to implement a change at this level, if it's the right thing to do,
we shouldn't let the question of which method is deprecated dictate
which one *should* be deprecated.


> Is there also an option of using a multiPQ approach with the new style
> comparators?
>

For the record: that would be the worst of all worlds, in my view: harder
API with only better performance in some cases, and sometimes worse
performance.

  -jake

Mime
View raw message