lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Fri, 16 Oct 2009 16:33:30 GMT
Mike, just a clarification on my first perf report email.
The first section, numHits is incorrectly labeled, it should be 20 instead
of 50. Sorry about the possible confusion.

Thanks

-John

On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Thanks John; I'll have a look.
>
> Mike
>
> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.wang@gmail.com> wrote:
> > Hi Michael:
> >     I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector
> as
> > a more general case. I think keeping the old api for ScoreDocComparator
> and
> > SortComparatorSource would work.
> >   Please take a look.
> > Thanks
> > -John
> >
> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.wang@gmail.com> wrote:
> >>
> >> Hi Michael:
> >>      It is open, http://code.google.com/p/lucene-book/source/checkout
> >>      I think I sent the https url instead, sorry.
> >>     The multi PQ sorting is fairly self-contained, I have 2 versions, 1
> >> for string and 1 for int, each are Collector impls.
> >>      I shouldn't say the Multi Q is faster on int sort, it is within the
> >> error boundary. The diff is very very small, I would stay they are more
> >> equal.
> >>      If you think it is a good thing to go this way, (if not for the
> perf,
> >> just for the simpler api) I'd be happy to work on a patch.
> >> Thanks
> >> -John
> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
> >> <lucene@mikemccandless.com> wrote:
> >>>
> >>> John, looks like this requires login -- any plans to open that up, or,
> >>> post the code on an issue?
> >>>
> >>> How self-contained is your Multi PQ sorting?  EG is it a standalone
> >>> Collector impl that I can test?
> >>>
> >>> Mike
> >>>
> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.wang@gmail.com>
> wrote:
> >>> > BTW, we are have a little sandbox for these experiments. And all my
> >>> > testcode
> >>> > are at. They are not very polished.
> >>> >
> >>> > https://lucene-book.googlecode.com/svn/trunk
> >>> >
> >>> > -John
> >>> >
> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang <john.wang@gmail.com>
> wrote:
> >>> >>
> >>> >> Numbers Mike requested for Int types:
> >>> >>
> >>> >> only the time/cputime are posted, others are all the same since
the
> >>> >> algorithm is the same.
> >>> >>
> >>> >> Lucene 2.9:
> >>> >> numhits: 10
> >>> >> time: 14619495
> >>> >> cpu: 146126
> >>> >>
> >>> >> numhits: 20
> >>> >> time: 14550568
> >>> >> cpu: 163242
> >>> >>
> >>> >> numhits: 100
> >>> >> time: 16467647
> >>> >> cpu: 178379
> >>> >>
> >>> >>
> >>> >> my test:
> >>> >> numHits: 10
> >>> >> time: 14101094
> >>> >> cpu: 144715
> >>> >>
> >>> >> numHits: 20
> >>> >> time: 14804821
> >>> >> cpu: 151305
> >>> >>
> >>> >> numHits: 100
> >>> >> time: 15372157
> >>> >> cpu time: 158842
> >>> >>
> >>> >> Conclusions:
> >>> >> The are very similar, the differences are all within error bounds,
> >>> >> especially with lower PQ sizes, which second sort alg again slightly
> >>> >> faster.
> >>> >>
> >>> >> Hope this helps.
> >>> >>
> >>> >> -John
> >>> >>
> >>> >>
> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
> >>> >> <yonik@lucidimagination.com>
> >>> >> wrote:
> >>> >>>
> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
> >>> >>> <lucene@mikemccandless.com> wrote:
> >>> >>> > Though it'd be odd if the switch to searching by segment
> >>> >>> > really was most of the gains here.
> >>> >>>
> >>> >>> I had assumed that much of the improvement was due to ditching
> >>> >>> MultiTermEnum/MultiTermDocs.
> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only
helps
> >>> >>> with queries that use a TermEnum (range, prefix, etc).
> >>> >>>
> >>> >>> -Yonik
> >>> >>> http://www.lucidimagination.com
> >>> >>>
> >>> >>>
> ---------------------------------------------------------------------
> >>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>> >>>
> >>> >>
> >>> >
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>>
> >>
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message