lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Fri, 16 Oct 2009 04:57:02 GMT
Hi Michael:
    I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as
a more general case. I think keeping the old api for ScoreDocComparator and
SortComparatorSource would work.

  Please take a look.

Thanks

-John

On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.wang@gmail.com> wrote:

> Hi Michael:
>      It is open, http://code.google.com/p/lucene-book/source/checkout
>
>      I think I sent the https url instead, sorry.
>
>     The multi PQ sorting is fairly self-contained, I have 2 versions, 1 for
> string and 1 for int, each are Collector impls.
>
>      I shouldn't say the Multi Q is faster on int sort, it is within the
> error boundary. The diff is very very small, I would stay they are more
> equal.
>
>      If you think it is a good thing to go this way, (if not for the perf,
> just for the simpler api) I'd be happy to work on a patch.
>
> Thanks
>
> -John
>
> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> John, looks like this requires login -- any plans to open that up, or,
>> post the code on an issue?
>>
>> How self-contained is your Multi PQ sorting?  EG is it a standalone
>> Collector impl that I can test?
>>
>> Mike
>>
>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.wang@gmail.com> wrote:
>> > BTW, we are have a little sandbox for these experiments. And all my
>> testcode
>> > are at. They are not very polished.
>> >
>> > https://lucene-book.googlecode.com/svn/trunk
>> >
>> > -John
>> >
>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <john.wang@gmail.com> wrote:
>> >>
>> >> Numbers Mike requested for Int types:
>> >>
>> >> only the time/cputime are posted, others are all the same since the
>> >> algorithm is the same.
>> >>
>> >> Lucene 2.9:
>> >> numhits: 10
>> >> time: 14619495
>> >> cpu: 146126
>> >>
>> >> numhits: 20
>> >> time: 14550568
>> >> cpu: 163242
>> >>
>> >> numhits: 100
>> >> time: 16467647
>> >> cpu: 178379
>> >>
>> >>
>> >> my test:
>> >> numHits: 10
>> >> time: 14101094
>> >> cpu: 144715
>> >>
>> >> numHits: 20
>> >> time: 14804821
>> >> cpu: 151305
>> >>
>> >> numHits: 100
>> >> time: 15372157
>> >> cpu time: 158842
>> >>
>> >> Conclusions:
>> >> The are very similar, the differences are all within error bounds,
>> >> especially with lower PQ sizes, which second sort alg again slightly
>> faster.
>> >>
>> >> Hope this helps.
>> >>
>> >> -John
>> >>
>> >>
>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley <
>> yonik@lucidimagination.com>
>> >> wrote:
>> >>>
>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
>> >>> <lucene@mikemccandless.com> wrote:
>> >>> > Though it'd be odd if the switch to searching by segment
>> >>> > really was most of the gains here.
>> >>>
>> >>> I had assumed that much of the improvement was due to ditching
>> >>> MultiTermEnum/MultiTermDocs.
>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only helps
>> >>> with queries that use a TermEnum (range, prefix, etc).
>> >>>
>> >>> -Yonik
>> >>> http://www.lucidimagination.com
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>>
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>

Mime
View raw message