lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <>
Subject Re: lucene 2.9 sorting algorithm
Date Tue, 20 Oct 2009 15:55:05 GMT
Hi guys:
    I am not suggesting just simply changing the deprecated signatures.
There are some work to be done of course. In the beginning of the thread, we
discussed two algorithms (both handling per-segment field loading), and at
the conclusion, (to be still verified by Mike) that both algorithms perform
the same. (We do see once the queue size increases, the performance cost
increased more for the single Q approach, the one in the trunk, than the
multiQ approach, please see the numbers I posted earlier in this thread.)

    However, the multiQ algorithm would allow us to keep the old simpler
api, and the simpler api places less restriction on the type of custom
sorting that can be done.

Let me provide an example:

    We have a multi valued field on integers, we define a sort on this set
of strings by defining a comparator on each value to be similar to a lex
order, instead of compare on characters, we do on strings, we also want to
keep the multi value representation as we do filtering and facet counting on
it. The in memory representation is similar to the UnInvertedField in Solr.

   Implementing a sort with the old API was rather simple, as we only needed
mapping from a docid to a set of ordinals. With the new api, we needed to do
a "conversion", which would mean mapping a set of String/ordinals back to a
doc. Which is to me, is not trivial, let alone performance implications.

   That actually gave us to motivation to see if the old api can handle the
segment level changes that was made in 2.9 (which in my opinion is the best
thing in lucene since payloads :) )

   So after some investigation, with code and big O analysis, and
discussions with Mike and Yonik, on our end, we feel given the performance
numbers, it is unnecessary to go with the more complicated API.



On Tue, Oct 20, 2009 at 6:00 AM, Mark Miller <> wrote:

> Actually though - how are we supposed to get back there? I don't think
> its as simple as just not removing the deprecated API's. Doesn't even
> seem close to that simple. Its another nightmare. It would have to be
> some serious wins to go through that pain starting at a 3.0 release
> wouldn't it? We just had a ton of people switch. We would have to
> deprecate a bunch of stuff. Hard to imagine wanting to switch now - the
> new API is certainly not that bad.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message