lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Tue, 20 Oct 2009 15:56:28 GMT
Sorry, mistyped again, we have a multivalued field of STRINGS, no integers.
-John

On Tue, Oct 20, 2009 at 8:55 AM, John Wang <john.wang@gmail.com> wrote:

> Hi guys:
>     I am not suggesting just simply changing the deprecated signatures.
> There are some work to be done of course. In the beginning of the thread, we
> discussed two algorithms (both handling per-segment field loading), and at
> the conclusion, (to be still verified by Mike) that both algorithms perform
> the same. (We do see once the queue size increases, the performance cost
> increased more for the single Q approach, the one in the trunk, than the
> multiQ approach, please see the numbers I posted earlier in this thread.)
>
>     However, the multiQ algorithm would allow us to keep the old simpler
> api, and the simpler api places less restriction on the type of custom
> sorting that can be done.
>
> Let me provide an example:
>
>     We have a multi valued field on integers, we define a sort on this set
> of strings by defining a comparator on each value to be similar to a lex
> order, instead of compare on characters, we do on strings, we also want to
> keep the multi value representation as we do filtering and facet counting on
> it. The in memory representation is similar to the UnInvertedField in Solr.
>
>    Implementing a sort with the old API was rather simple, as we only
> needed mapping from a docid to a set of ordinals. With the new api, we
> needed to do a "conversion", which would mean mapping a set of
> String/ordinals back to a doc. Which is to me, is not trivial, let alone
> performance implications.
>
>    That actually gave us to motivation to see if the old api can handle the
> segment level changes that was made in 2.9 (which in my opinion is the best
> thing in lucene since payloads :) )
>
>    So after some investigation, with code and big O analysis, and
> discussions with Mike and Yonik, on our end, we feel given the performance
> numbers, it is unnecessary to go with the more complicated API.
>
> Thanks
>
> -John
>
>
>
> On Tue, Oct 20, 2009 at 6:00 AM, Mark Miller <markrmiller@gmail.com>wrote:
>
>> Actually though - how are we supposed to get back there? I don't think
>> its as simple as just not removing the deprecated API's. Doesn't even
>> seem close to that simple. Its another nightmare. It would have to be
>> some serious wins to go through that pain starting at a 3.0 release
>> wouldn't it? We just had a ton of people switch. We would have to
>> deprecate a bunch of stuff. Hard to imagine wanting to switch now - the
>> new API is certainly not that bad.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

Mime
View raw message