Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 92513 invoked from network); 23 Oct 2009 02:35:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Oct 2009 02:35:41 -0000 Received: (qmail 97008 invoked by uid 500); 23 Oct 2009 02:35:40 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 96932 invoked by uid 500); 23 Oct 2009 02:35:40 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 96924 invoked by uid 99); 23 Oct 2009 02:35:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 02:35:40 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jake.mannix@gmail.com designates 209.85.210.192 as permitted sender) Received: from [209.85.210.192] (HELO mail-yx0-f192.google.com) (209.85.210.192) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 02:35:37 +0000 Received: by yxe30 with SMTP id 30so11331678yxe.29 for ; Thu, 22 Oct 2009 19:35:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=Y25NBo96ZuF/QS0F530CUL4GYSwlVnk8iFy4MRhOf9U=; b=ApbLFgBoY2vDBqmP04e44u1kxbV1uVagYYOvcmWQvVvAXI8mSHvQEnW5g+UABYq0xJ uzK2Lne89OLdJcCh2ckJzzAonpnXBTOTtDKOiS/SOYtF7bFlnuxbpsWTo0tQaBtc9rav 5PpGWVQGR36bMg4xHIVlUu64khTOV0CR+ea2s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=R/FCCoM/yrevvISGCmBpnU+f6wc6MzZzDK1X+YIyVBBt2yVQlPA+V8/LcN4RMdzwuH TQnogY6NOam6lwq0a2DBYLBtQ65LRfK2tF9DUpEurbim1OyGexNZaBD2x9BQr4R8Utth ML+tULUUPt12qS59V9798WL+/7AU+9NSALXBw= MIME-Version: 1.0 Received: by 10.90.22.17 with SMTP id 17mr12995812agv.76.1256265316740; Thu, 22 Oct 2009 19:35:16 -0700 (PDT) In-Reply-To: <4AE114FA.7050309@gmail.com> References: <8837fb770910142012w5f11ba57y99aeb18493603813@mail.gmail.com> <21B2798F178148A794152ECCE322756C@VEGA> <4ADDB333.7020801@gmail.com> <4ADDB485.4000508@gmail.com> <8837fb770910200855v554c3939mdf8a6ad67a844a43@mail.gmail.com> <9ac0c6aa0910210311nffa18c4gd3acbf0e73ef7b58@mail.gmail.com> <8837fb770910212317g5a8c09ecid1810f301087583c@mail.gmail.com> <9ac0c6aa0910220238l5cdce87g314ea18cd85bd695@mail.gmail.com> <8837fb770910221837p287db022i6ed18349977381c6@mail.gmail.com> <4AE114FA.7050309@gmail.com> Date: Thu, 22 Oct 2009 19:35:16 -0700 Message-ID: <4b124c310910221935w58330d20u8ee6e09c433b0b68@mail.gmail.com> Subject: Re: lucene 2.9 sorting algorithm From: Jake Mannix To: java-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e6509768f2afdf0476910f22 --0016e6509768f2afdf0476910f22 Content-Type: text/plain; charset=ISO-8859-1 Mark, We're not seeing exactly the numbers that Mike is seeing in his tests, running with jdk 1.5 on intel macs, so we're trying to eliminate factors of difference. Point 2 does indeed make a difference, we've seen it, and it's only fair: the single pq comparator does this branch optimization but the current patch multi-pq does not, so let's level the playing field. John's on the road with limited net connectivity, but we'll have some numbers to compare more over the weekend for sure. -jake On Thu, Oct 22, 2009 at 7:29 PM, Mark Miller wrote: > Why? What might he find? Whats with the cryptic request? > > Why would Java 1.5 perform better than 1.6? It erases 20 and 40% gains? > > I know point 2 certainly doesn't. Cards on the table? > > John Wang wrote: > > Hey Michael: > > > > Would you mind rerunning the test you have with jdk1.5? > > > > Also, if you would, change the comparator method to avoid > > brachning for int and string comparators, e.g. > > > > > > return index.order[i.doc] - index.order[j.doc]; > > > > > > Thanks > > > > > > -John > > > > > > On Thu, Oct 22, 2009 at 2:38 AM, Michael McCandless > > > wrote: > > > > On Thu, Oct 22, 2009 at 2:17 AM, John Wang > > wrote: > > > > > I have been playing with the patch, and I think I have some > > information > > > that you might like. > > > Let me spend sometime and gather some more numbers and > > update in jira. > > > > Excellent! > > > > > say bottom has ords 23, 45, 76, each corresponding to a > > string. When > > > moving to the next segment, you need to make bottom to have ords > > that can be > > > comparable to other docs in this new segment, so you would need > > to find the > > > new ords for the values in 23,45 and 76, don't you? To find it, > > assuming the > > > values are s1,s2,s3, you would do a bin. search on the new val > > array, and > > > find index for s1,s2,s3. > > > > It's that inversion (from ord->Comparable in first seg, and > > Comparable->ord in second seg) that I'm trying to avoid (w/ this new > > proposal). > > > > > Which is 3 bin searches per convert, I am not sure > > > how you can short circuit it. Are you suggesting we call > > Comparable on > > > compareBottom until some doc beats it? > > > > I'm saying on seg transition you indeed get the Comparable for > current > > bottom, but, don't attempt to invert it. Instead, as seg 2 finds a > > hit, you get that hit's Comparables and compare to bottom. If it > > beats bottom, it goes into the queue. If it does not, you use the > ord > > (in seg 2's ord space) to "learn" a bottom in the ord space of seg 2. > > > > > That would hurt performance I lot though, no? > > > > Yeah I think likely it would, since we're talking about a binary > > search on transition VS having to do possibly many > > upgrade-to-Comparable and compare-Comparabls to slowly learn the > > equivalent ord in the new segment. I was proposing it for cases > where > > inversion is very difficult. But realistically, since you must keep > > around the ful ord -> Comparable for every segment anyway (in order > to > > merge in the end), inversion shouldn't ever actually be "difficult" > -- > > it'd just be a binary search on presumably in-RAM storage. > > > > Mike > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > > > > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > > > > > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --0016e6509768f2afdf0476910f22 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Mark,
=A0
=A0 We're not seeing exactly the numbers that Mike is = seeing in his tests,
running with jdk 1.5 on intel macs, so we're t= rying to eliminate factors of difference.

=A0 Point 2 does indeed ma= ke a difference, we've seen it, and it's only fair: the
single pq comparator does this branch optimization but the current patch mu= lti-pq
does not, so let's level the playing field.

=A0 John&= #39;s on the road with limited net connectivity, but we'll have some nu= mbers to
compare more over the weekend for sure.

=A0 -jake

On Thu, Oct 22, 2009 at 7:29 PM, Mark Miller <markrmiller@gmail.com= > wrote:
Why? What might h= e find? Whats with the cryptic request?

Why would Java 1.5 perform better than 1.6? It erases 20 and 40% gains?

I know point 2 certainly doesn't. Cards on the table?

John Wang wrote:
> Hey Michael:
>
> =A0 =A0 =A0 =A0Would you mind rerunning the test you have with jdk1.5?=
>
> =A0 =A0 =A0 =A0Also, if you would, change the comparator method to avo= id
> brachning for int and string comparators, e.g.
>
>
> =A0 =A0 =A0 return index.order[i.doc] - index.order[j.doc];
>
>
> Thanks
>
>
> -John
>
>
> On Thu, Oct 22, 2009 at 2:38 AM, Michael McCandless
> <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>> wrote:
>
> =A0 =A0 On Thu, Oct 22, 2009 at 2:17 AM, John Wang <john.wang@gmail.com
> =A0 =A0 <mailto:john.wang@gmail.com>> wrote:
>
> =A0 =A0 > =A0 =A0 =A0I have been playing with the patch, and I thin= k I have some
> =A0 =A0 information
> =A0 =A0 > that you might like.
> =A0 =A0 > =A0 =A0 =A0Let me spend sometime and gather some more num= bers and
> =A0 =A0 update in jira.
>
> =A0 =A0 Excellent!
>
> =A0 =A0 > =A0 =A0 =A0say bottom has ords 23, 45, 76, each correspon= ding to a
> =A0 =A0 string. When
> =A0 =A0 > moving to the next segment, you need to make bottom to ha= ve ords
> =A0 =A0 that can be
> =A0 =A0 > comparable to other docs in this new segment, so you woul= d need
> =A0 =A0 to find the
> =A0 =A0 > new ords for the values in 23,45 and 76, don't you? T= o find it,
> =A0 =A0 assuming the
> =A0 =A0 > values are s1,s2,s3, you would do a bin. search on the ne= w val
> =A0 =A0 array, and
> =A0 =A0 > find index for s1,s2,s3.
>
> =A0 =A0 It's that inversion (from ord->Comparable in first seg,= and
> =A0 =A0 Comparable->ord in second seg) that I'm trying to avoid= (w/ this new
> =A0 =A0 proposal).
>
> =A0 =A0 > Which is 3 bin searches per convert, I am not sure
> =A0 =A0 > how you can short circuit it. Are you suggesting we call<= br> > =A0 =A0 Comparable on
> =A0 =A0 > compareBottom until some doc beats it?
>
> =A0 =A0 I'm saying on seg transition you indeed get the Comparable= for current
> =A0 =A0 bottom, but, don't attempt to invert it. =A0Instead, as se= g 2 finds a
> =A0 =A0 hit, you get that hit's Comparables and compare to bottom.= =A0If it
> =A0 =A0 beats bottom, it goes into the queue. =A0If it does not, you u= se the ord
> =A0 =A0 (in seg 2's ord space) to "learn" a bottom in th= e ord space of seg 2.
>
> =A0 =A0 > That would hurt performance I lot though, no?
>
> =A0 =A0 Yeah I think likely it would, since we're talking about a = binary
> =A0 =A0 search on transition VS having to do possibly many
> =A0 =A0 upgrade-to-Comparable and compare-Comparabls to slowly learn t= he
> =A0 =A0 equivalent ord in the new segment. =A0I was proposing it for c= ases where
> =A0 =A0 inversion is very difficult. =A0But realistically, since you m= ust keep
> =A0 =A0 around the ful ord -> Comparable for every segment anyway (= in order to
> =A0 =A0 merge in the end), inversion shouldn't ever actually be &q= uot;difficult" --
> =A0 =A0 it'd just be a binary search on presumably in-RAM storage.=
>
> =A0 =A0 Mike
>
> =A0 =A0 --------------------------------------------------------------= -------
> =A0 =A0 To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> =A0 =A0 <mailto:java-dev-unsubscribe@lucene.apache.org>
> =A0 =A0 For additional commands, e-mail: java-dev-help@lucene.apache.org=
> =A0 =A0 <mailto:java-dev-help@lucene.apache.org>
>
>


--
- Mark

http://www.lu= cidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.= apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


--0016e6509768f2afdf0476910f22--