lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <>
Subject Re: lucene 2.9 sorting algorithm
Date Tue, 20 Oct 2009 15:56:28 GMT
Sorry, mistyped again, we have a multivalued field of STRINGS, no integers.

On Tue, Oct 20, 2009 at 8:55 AM, John Wang <> wrote:

> Hi guys:
>     I am not suggesting just simply changing the deprecated signatures.
> There are some work to be done of course. In the beginning of the thread, we
> discussed two algorithms (both handling per-segment field loading), and at
> the conclusion, (to be still verified by Mike) that both algorithms perform
> the same. (We do see once the queue size increases, the performance cost
> increased more for the single Q approach, the one in the trunk, than the
> multiQ approach, please see the numbers I posted earlier in this thread.)
>     However, the multiQ algorithm would allow us to keep the old simpler
> api, and the simpler api places less restriction on the type of custom
> sorting that can be done.
> Let me provide an example:
>     We have a multi valued field on integers, we define a sort on this set
> of strings by defining a comparator on each value to be similar to a lex
> order, instead of compare on characters, we do on strings, we also want to
> keep the multi value representation as we do filtering and facet counting on
> it. The in memory representation is similar to the UnInvertedField in Solr.
>    Implementing a sort with the old API was rather simple, as we only
> needed mapping from a docid to a set of ordinals. With the new api, we
> needed to do a "conversion", which would mean mapping a set of
> String/ordinals back to a doc. Which is to me, is not trivial, let alone
> performance implications.
>    That actually gave us to motivation to see if the old api can handle the
> segment level changes that was made in 2.9 (which in my opinion is the best
> thing in lucene since payloads :) )
>    So after some investigation, with code and big O analysis, and
> discussions with Mike and Yonik, on our end, we feel given the performance
> numbers, it is unnecessary to go with the more complicated API.
> Thanks
> -John
> On Tue, Oct 20, 2009 at 6:00 AM, Mark Miller <>wrote:
>> Actually though - how are we supposed to get back there? I don't think
>> its as simple as just not removing the deprecated API's. Doesn't even
>> seem close to that simple. Its another nightmare. It would have to be
>> some serious wins to go through that pain starting at a 3.0 release
>> wouldn't it? We just had a ton of people switch. We would have to
>> deprecate a bunch of stuff. Hard to imagine wanting to switch now - the
>> new API is certainly not that bad.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

View raw message