lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: lucene 2.9 sorting algorithm
Date Wed, 21 Oct 2009 09:47:18 GMT
OK, thanks.

I can help out if you've got questions on the python code... it's
rather straightforward: it just iterates over each set of params to
test, writes an alg file, runs it, opens the resulting output & parses
it for the best run, confirms both single & multi PQ gave precisely
the same doc IDs, and prints the results.

It's remotely possible the difference in the results is a bug/overhead
in contrib/benchmark itself, which'd be good to get to the bottom of
anyway.

Mike

On Tue, Oct 20, 2009 at 9:17 PM, John Wang <john.wang@gmail.com> wrote:
> Hi Mike:
>     That's weird. Let me take a look at the patch. Need to brush up on
> python though :)
> Thanks
> -John
>
> On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> OK I posted a patch that folds the MultiPQ approach into
>> contrib/benchmark, plus a simple python wrapper to run old/new tests
>> across different queries, sort, topN, etc.
>>
>> But I got different results... MultiPQ looks generally slower than
>> SinglePQ.  So I think we now need to reconcile what's different
>> between our tests.
>>
>> Mike
>>
>> On Mon, Oct 19, 2009 at 9:28 PM, John Wang <john.wang@gmail.com> wrote:
>> > Hi Michael:
>> >      Was wondering if you got a chance to take a look at this.
>> >      Since deprecated APIs are being removed in 3.0, I was wondering
>> > if/when
>> > we would decide on keeping the ScoreDocComparator API and thus would be
>> > kept
>> > for Lucene 3.0.
>> > Thanks
>> > -John
>> >
>> > On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless
>> > <lucene@mikemccandless.com> wrote:
>> >>
>> >> Oh, no problem...
>> >>
>> >> Mike
>> >>
>> >> On Fri, Oct 16, 2009 at 12:33 PM, John Wang <john.wang@gmail.com>
>> >> wrote:
>> >> > Mike, just a clarification on my first perf report email.
>> >> > The first section, numHits is incorrectly labeled, it should be 20
>> >> > instead
>> >> > of 50. Sorry about the possible confusion.
>> >> > Thanks
>> >> > -John
>> >> >
>> >> > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
>> >> > <lucene@mikemccandless.com> wrote:
>> >> >>
>> >> >> Thanks John; I'll have a look.
>> >> >>
>> >> >> Mike
>> >> >>
>> >> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang <john.wang@gmail.com>
>> >> >> wrote:
>> >> >> > Hi Michael:
>> >> >> >     I added classes: ScoreDocComparatorQueue
>> >> >> > and OneSortNoScoreCollector
>> >> >> > as
>> >> >> > a more general case. I think keeping the old api for
>> >> >> > ScoreDocComparator
>> >> >> > and
>> >> >> > SortComparatorSource would work.
>> >> >> >   Please take a look.
>> >> >> > Thanks
>> >> >> > -John
>> >> >> >
>> >> >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang <john.wang@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi Michael:
>> >> >> >>      It is
>> >> >> >> open, http://code.google.com/p/lucene-book/source/checkout
>> >> >> >>      I think I sent the https url instead, sorry.
>> >> >> >>     The multi PQ sorting is fairly self-contained,
I have 2
>> >> >> >> versions, 1
>> >> >> >> for string and 1 for int, each are Collector impls.
>> >> >> >>      I shouldn't say the Multi Q is faster on int sort,
it is
>> >> >> >> within
>> >> >> >> the
>> >> >> >> error boundary. The diff is very very small, I would stay
they
>> >> >> >> are
>> >> >> >> more
>> >> >> >> equal.
>> >> >> >>      If you think it is a good thing to go this way,
(if not for
>> >> >> >> the
>> >> >> >> perf,
>> >> >> >> just for the simpler api) I'd be happy to work on a patch.
>> >> >> >> Thanks
>> >> >> >> -John
>> >> >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
>> >> >> >> <lucene@mikemccandless.com> wrote:
>> >> >> >>>
>> >> >> >>> John, looks like this requires login -- any plans
to open that
>> >> >> >>> up,
>> >> >> >>> or,
>> >> >> >>> post the code on an issue?
>> >> >> >>>
>> >> >> >>> How self-contained is your Multi PQ sorting?  EG
is it a
>> >> >> >>> standalone
>> >> >> >>> Collector impl that I can test?
>> >> >> >>>
>> >> >> >>> Mike
>> >> >> >>>
>> >> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang <john.wang@gmail.com>
>> >> >> >>> wrote:
>> >> >> >>> > BTW, we are have a little sandbox for these experiments.
And
>> >> >> >>> > all
>> >> >> >>> > my
>> >> >> >>> > testcode
>> >> >> >>> > are at. They are not very polished.
>> >> >> >>> >
>> >> >> >>> > https://lucene-book.googlecode.com/svn/trunk
>> >> >> >>> >
>> >> >> >>> > -John
>> >> >> >>> >
>> >> >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang
>> >> >> >>> > <john.wang@gmail.com>
>> >> >> >>> > wrote:
>> >> >> >>> >>
>> >> >> >>> >> Numbers Mike requested for Int types:
>> >> >> >>> >>
>> >> >> >>> >> only the time/cputime are posted, others
are all the same
>> >> >> >>> >> since
>> >> >> >>> >> the
>> >> >> >>> >> algorithm is the same.
>> >> >> >>> >>
>> >> >> >>> >> Lucene 2.9:
>> >> >> >>> >> numhits: 10
>> >> >> >>> >> time: 14619495
>> >> >> >>> >> cpu: 146126
>> >> >> >>> >>
>> >> >> >>> >> numhits: 20
>> >> >> >>> >> time: 14550568
>> >> >> >>> >> cpu: 163242
>> >> >> >>> >>
>> >> >> >>> >> numhits: 100
>> >> >> >>> >> time: 16467647
>> >> >> >>> >> cpu: 178379
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> my test:
>> >> >> >>> >> numHits: 10
>> >> >> >>> >> time: 14101094
>> >> >> >>> >> cpu: 144715
>> >> >> >>> >>
>> >> >> >>> >> numHits: 20
>> >> >> >>> >> time: 14804821
>> >> >> >>> >> cpu: 151305
>> >> >> >>> >>
>> >> >> >>> >> numHits: 100
>> >> >> >>> >> time: 15372157
>> >> >> >>> >> cpu time: 158842
>> >> >> >>> >>
>> >> >> >>> >> Conclusions:
>> >> >> >>> >> The are very similar, the differences are
all within error
>> >> >> >>> >> bounds,
>> >> >> >>> >> especially with lower PQ sizes, which second
sort alg again
>> >> >> >>> >> slightly
>> >> >> >>> >> faster.
>> >> >> >>> >>
>> >> >> >>> >> Hope this helps.
>> >> >> >>> >>
>> >> >> >>> >> -John
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
>> >> >> >>> >> <yonik@lucidimagination.com>
>> >> >> >>> >> wrote:
>> >> >> >>> >>>
>> >> >> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael
McCandless
>> >> >> >>> >>> <lucene@mikemccandless.com> wrote:
>> >> >> >>> >>> > Though it'd be odd if the switch
to searching by segment
>> >> >> >>> >>> > really was most of the gains here.
>> >> >> >>> >>>
>> >> >> >>> >>> I had assumed that much of the improvement
was due to
>> >> >> >>> >>> ditching
>> >> >> >>> >>> MultiTermEnum/MultiTermDocs.
>> >> >> >>> >>> Note that LUCENE-1483 was before LUCENE-1596...
but that
>> >> >> >>> >>> only
>> >> >> >>> >>> helps
>> >> >> >>> >>> with queries that use a TermEnum (range,
prefix, etc).
>> >> >> >>> >>>
>> >> >> >>> >>> -Yonik
>> >> >> >>> >>> http://www.lucidimagination.com
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> ---------------------------------------------------------------------
>> >> >> >>> >>> To unsubscribe, e-mail:
>> >> >> >>> >>> java-dev-unsubscribe@lucene.apache.org
>> >> >> >>> >>> For additional commands, e-mail:
>> >> >> >>> >>> java-dev-help@lucene.apache.org
>> >> >> >>> >>>
>> >> >> >>> >>
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> ---------------------------------------------------------------------
>> >> >> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> >> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >> >>>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >> >>
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message