Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 57793 invoked from network); 20 Oct 2009 08:58:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Oct 2009 08:58:33 -0000 Received: (qmail 53398 invoked by uid 500); 20 Oct 2009 08:58:32 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 53299 invoked by uid 500); 20 Oct 2009 08:58:32 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 53291 invoked by uid 99); 20 Oct 2009 08:58:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2009 08:58:32 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [209.85.210.192] (HELO mail-yx0-f192.google.com) (209.85.210.192) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2009 08:58:21 +0000 Received: by yxe30 with SMTP id 30so7035287yxe.29 for ; Tue, 20 Oct 2009 01:57:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.27.1 with SMTP id a1mr10086856yba.125.1256029079063; Tue, 20 Oct 2009 01:57:59 -0700 (PDT) In-Reply-To: <8837fb770910191828u581ed5d7t16f1ef08647c772e@mail.gmail.com> References: <8837fb770910142012w5f11ba57y99aeb18493603813@mail.gmail.com> <8837fb770910151529o6b7a27ebof9dfd3f82a0e88ad@mail.gmail.com> <8837fb770910151533n7c3239b6k4668832c010e01c2@mail.gmail.com> <9ac0c6aa0910151718l7826f226o5996be66b5a85d21@mail.gmail.com> <8837fb770910151852o2f0f7930se9fddcf90fa36b91@mail.gmail.com> <8837fb770910152157w7f03165du9bcc5fe30605481b@mail.gmail.com> <9ac0c6aa0910160321t33f9cee2ke8a3e6a526e1cd80@mail.gmail.com> <8837fb770910160933w39e19046lac876baf74235a0@mail.gmail.com> <9ac0c6aa0910160953p6742c2cfx7377a2c82ede3e88@mail.gmail.com> <8837fb770910191828u581ed5d7t16f1ef08647c772e@mail.gmail.com> Date: Tue, 20 Oct 2009 04:57:59 -0400 Message-ID: <9ac0c6aa0910200157q4cd031bcu2c7eb897ed86059@mail.gmail.com> Subject: Re: lucene 2.9 sorting algorithm From: Michael McCandless To: java-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Sorry, I have been digging into it, just didn't get far enough to post patch/results. I'll try to do so today. I did find one bug in OneSortNoScoreCollector, in the getTop() method in the inner compare() method, to break ties it should be: if (v=3D=3D0 { v =3D o1.doc + o1.comparatorQueue._base - o2.doc - o2.comparatorQueue._= base; } not if (v=3D=3D0) { v=3Do1.doc+o2.comparatorQueue._base-o2.doc+o2.comparatorQueue._base; } I've folded the "multi PQ" approach into contrib/benchmark so we can run our tests... Mike On Mon, Oct 19, 2009 at 9:28 PM, John Wang wrote: > Hi Michael: > =A0=A0 =A0 Was wondering if you got a chance to take a look at this. > =A0=A0 =A0 Since deprecated APIs are being removed in 3.0, I was wonderin= g if/when > we would decide on keeping the ScoreDocComparator API and thus would be k= ept > for Lucene 3.0. > Thanks > -John > > On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless > wrote: >> >> Oh, no problem... >> >> Mike >> >> On Fri, Oct 16, 2009 at 12:33 PM, John Wang wrote: >> > Mike, just a clarification on my first perf report email. >> > The first section, numHits is incorrectly labeled, it should be 20 >> > instead >> > of 50. Sorry about the possible confusion. >> > Thanks >> > -John >> > >> > On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless >> > wrote: >> >> >> >> Thanks John; I'll have a look. >> >> >> >> Mike >> >> >> >> On Fri, Oct 16, 2009 at 12:57 AM, John Wang >> >> wrote: >> >> > Hi Michael: >> >> > =A0=A0 =A0I added classes:=A0ScoreDocComparatorQueue >> >> > and=A0OneSortNoScoreCollector >> >> > as >> >> > a more general case. I think keeping the old api for >> >> > ScoreDocComparator >> >> > and >> >> > SortComparatorSource would work. >> >> > =A0=A0Please take a look. >> >> > Thanks >> >> > -John >> >> > >> >> > On Thu, Oct 15, 2009 at 6:52 PM, John Wang >> >> > wrote: >> >> >> >> >> >> Hi Michael: >> >> >> =A0=A0 =A0 It is >> >> >> open,=A0http://code.google.com/p/lucene-book/source/checkout >> >> >> =A0=A0 =A0 I think I sent the https url instead, sorry. >> >> >> =A0=A0 =A0The multi PQ sorting is fairly self-contained, I have 2 >> >> >> versions, 1 >> >> >> for string and 1 for int, each are Collector impls. >> >> >> =A0=A0 =A0 I shouldn't say the Multi Q is faster on int sort, it i= s within >> >> >> the >> >> >> error boundary. The diff is very very small, I would stay they are >> >> >> more >> >> >> equal. >> >> >> =A0=A0 =A0 If you think it is a good thing to go this way, (if not= for the >> >> >> perf, >> >> >> just for the simpler api) I'd be happy to work on a patch. >> >> >> Thanks >> >> >> -John >> >> >> On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless >> >> >> wrote: >> >> >>> >> >> >>> John, looks like this requires login -- any plans to open that up= , >> >> >>> or, >> >> >>> post the code on an issue? >> >> >>> >> >> >>> How self-contained is your Multi PQ sorting? =A0EG is it a standa= lone >> >> >>> Collector impl that I can test? >> >> >>> >> >> >>> Mike >> >> >>> >> >> >>> On Thu, Oct 15, 2009 at 6:33 PM, John Wang >> >> >>> wrote: >> >> >>> > BTW, we are have a little sandbox for these experiments. And al= l >> >> >>> > my >> >> >>> > testcode >> >> >>> > are at. They are not very polished. >> >> >>> > >> >> >>> > https://lucene-book.googlecode.com/svn/trunk >> >> >>> > >> >> >>> > -John >> >> >>> > >> >> >>> > On Thu, Oct 15, 2009 at 3:29 PM, John Wang >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> Numbers Mike requested for Int types: >> >> >>> >> >> >> >>> >> only the time/cputime are posted, others are all the same sinc= e >> >> >>> >> the >> >> >>> >> algorithm is the same. >> >> >>> >> >> >> >>> >> Lucene 2.9: >> >> >>> >> numhits: 10 >> >> >>> >> time: 14619495 >> >> >>> >> cpu: 146126 >> >> >>> >> >> >> >>> >> numhits: 20 >> >> >>> >> time: 14550568 >> >> >>> >> cpu: 163242 >> >> >>> >> >> >> >>> >> numhits: 100 >> >> >>> >> time: 16467647 >> >> >>> >> cpu: 178379 >> >> >>> >> >> >> >>> >> >> >> >>> >> my test: >> >> >>> >> numHits: 10 >> >> >>> >> time: 14101094 >> >> >>> >> cpu: 144715 >> >> >>> >> >> >> >>> >> numHits: 20 >> >> >>> >> time: 14804821 >> >> >>> >> cpu: 151305 >> >> >>> >> >> >> >>> >> numHits: 100 >> >> >>> >> time: 15372157 >> >> >>> >> cpu time: 158842 >> >> >>> >> >> >> >>> >> Conclusions: >> >> >>> >> The are very similar, the differences are all within error >> >> >>> >> bounds, >> >> >>> >> especially with lower PQ sizes, which second sort alg again >> >> >>> >> slightly >> >> >>> >> faster. >> >> >>> >> >> >> >>> >> Hope this helps. >> >> >>> >> >> >> >>> >> -John >> >> >>> >> >> >> >>> >> >> >> >>> >> On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley >> >> >>> >> >> >> >>> >> wrote: >> >> >>> >>> >> >> >>> >>> On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless >> >> >>> >>> wrote: >> >> >>> >>> > Though it'd be odd if the switch to searching by segment >> >> >>> >>> > really was most of the gains here. >> >> >>> >>> >> >> >>> >>> I had assumed that much of the improvement was due to ditchin= g >> >> >>> >>> MultiTermEnum/MultiTermDocs. >> >> >>> >>> Note that LUCENE-1483 was before LUCENE-1596... but that only >> >> >>> >>> helps >> >> >>> >>> with queries that use a TermEnum (range, prefix, etc). >> >> >>> >>> >> >> >>> >>> -Yonik >> >> >>> >>> http://www.lucidimagination.com >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> -------------------------------------------------------------= -------- >> >> >>> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.or= g >> >> >>> >>> For additional commands, e-mail: >> >> >>> >>> java-dev-help@lucene.apache.org >> >> >>> >>> >> >> >>> >> >> >> >>> > >> >> >>> > >> >> >>> >> >> >>> >> >> >>> -----------------------------------------------------------------= ---- >> >> >>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> >> >>> For additional commands, e-mail: java-dev-help@lucene.apache.org >> >> >>> >> >> >> >> >> > >> >> > >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org