Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of jake.mannix@gmail.com
 designates 209.85.210.192 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=R/FCCoM/yrevvISGCmBpnU+f6wc6MzZzDK1X+YIyVBBt2yVQlPA+V8/LcN4RMdzwuH
         TQnogY6NOam6lwq0a2DBYLBtQ65LRfK2tF9DUpEurbim1OyGexNZaBD2x9BQr4R8Utth
         ML+tULUUPt12qS59V9798WL+/7AU+9NSALXBw=
MIME-Version: 1.0
In-Reply-To: <4AE114FA.7050309@gmail.com>
References: <8837fb770910142012w5f11ba57y99aeb18493603813@mail.gmail.com>
	 <21B2798F178148A794152ECCE322756C@VEGA> <4ADDB333.7020801@gmail.com>
	 <4ADDB485.4000508@gmail.com>
	 <8837fb770910200855v554c3939mdf8a6ad67a844a43@mail.gmail.com>
	 <9ac0c6aa0910210311nffa18c4gd3acbf0e73ef7b58@mail.gmail.com>
	 <8837fb770910212317g5a8c09ecid1810f301087583c@mail.gmail.com>
	 <9ac0c6aa0910220238l5cdce87g314ea18cd85bd695@mail.gmail.com>
	 <8837fb770910221837p287db022i6ed18349977381c6@mail.gmail.com>
	 <4AE114FA.7050309@gmail.com>
Date: Thu, 22 Oct 2009 19:35:16 -0700
Message-ID: <4b124c310910221935w58330d20u8ee6e09c433b0b68@mail.gmail.com>
Subject: Re: lucene 2.9 sorting algorithm
From: Jake Mannix <jake.mannix@gmail.com>
To: java-dev@lucene.apache.org
Content-Type: multipart/alternative; boundary=0016e6509768f2afdf0476910f22

--0016e6509768f2afdf0476910f22
Content-Type: text/plain; charset=ISO-8859-1

Mark,

  We're not seeing exactly the numbers that Mike is seeing in his tests,
running with jdk 1.5 on intel macs, so we're trying to eliminate factors of
difference.

  Point 2 does indeed make a difference, we've seen it, and it's only fair:
the
single pq comparator does this branch optimization but the current patch
multi-pq
does not, so let's level the playing field.

  John's on the road with limited net connectivity, but we'll have some
numbers to
compare more over the weekend for sure.

  -jake

On Thu, Oct 22, 2009 at 7:29 PM, Mark Miller <markrmiller@gmail.com> wrote:

> Why? What might he find? Whats with the cryptic request?
>
> Why would Java 1.5 perform better than 1.6? It erases 20 and 40% gains?
>
> I know point 2 certainly doesn't. Cards on the table?
>
> John Wang wrote:
> > Hey Michael:
> >
> >        Would you mind rerunning the test you have with jdk1.5?
> >
> >        Also, if you would, change the comparator method to avoid
> > brachning for int and string comparators, e.g.
> >
> >
> >       return index.order[i.doc] - index.order[j.doc];
> >
> >
> > Thanks
> >
> >
> > -John
> >
> >
> > On Thu, Oct 22, 2009 at 2:38 AM, Michael McCandless
> > <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>> wrote:
> >
> >     On Thu, Oct 22, 2009 at 2:17 AM, John Wang <john.wang@gmail.com
> >     <mailto:john.wang@gmail.com>> wrote:
> >
> >     >      I have been playing with the patch, and I think I have some
> >     information
> >     > that you might like.
> >     >      Let me spend sometime and gather some more numbers and
> >     update in jira.
> >
> >     Excellent!
> >
> >     >      say bottom has ords 23, 45, 76, each corresponding to a
> >     string. When
> >     > moving to the next segment, you need to make bottom to have ords
> >     that can be
> >     > comparable to other docs in this new segment, so you would need
> >     to find the
> >     > new ords for the values in 23,45 and 76, don't you? To find it,
> >     assuming the
> >     > values are s1,s2,s3, you would do a bin. search on the new val
> >     array, and
> >     > find index for s1,s2,s3.
> >
> >     It's that inversion (from ord->Comparable in first seg, and
> >     Comparable->ord in second seg) that I'm trying to avoid (w/ this new
> >     proposal).
> >
> >     > Which is 3 bin searches per convert, I am not sure
> >     > how you can short circuit it. Are you suggesting we call
> >     Comparable on
> >     > compareBottom until some doc beats it?
> >
> >     I'm saying on seg transition you indeed get the Comparable for
> current
> >     bottom, but, don't attempt to invert it.  Instead, as seg 2 finds a
> >     hit, you get that hit's Comparables and compare to bottom.  If it
> >     beats bottom, it goes into the queue.  If it does not, you use the
> ord
> >     (in seg 2's ord space) to "learn" a bottom in the ord space of seg 2.
> >
> >     > That would hurt performance I lot though, no?
> >
> >     Yeah I think likely it would, since we're talking about a binary
> >     search on transition VS having to do possibly many
> >     upgrade-to-Comparable and compare-Comparabls to slowly learn the
> >     equivalent ord in the new segment.  I was proposing it for cases
> where
> >     inversion is very difficult.  But realistically, since you must keep
> >     around the ful ord -> Comparable for every segment anyway (in order
> to
> >     merge in the end), inversion shouldn't ever actually be "difficult"
> --
> >     it'd just be a binary search on presumably in-RAM storage.
> >
> >     Mike
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

--0016e6509768f2afdf0476910f22
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Mark,<br>=A0 <br>=A0 We&#39;re not seeing exactly the numbers that Mike is =
seeing in his tests, <br>running with jdk 1.5 on intel macs, so we&#39;re t=
rying to eliminate factors of difference.<br><br>=A0 Point 2 does indeed ma=
ke a difference, we&#39;ve seen it, and it&#39;s only fair: the <br>
single pq comparator does this branch optimization but the current patch mu=
lti-pq <br>does not, so let&#39;s level the playing field.<br><br>=A0 John&=
#39;s on the road with limited net connectivity, but we&#39;ll have some nu=
mbers to<br>
compare more over the weekend for sure.<br><br>=A0 -jake<br><br><div class=
=3D"gmail_quote">On Thu, Oct 22, 2009 at 7:29 PM, Mark Miller <span dir=3D"=
ltr">&lt;<a href=3D"mailto:markrmiller@gmail.com">markrmiller@gmail.com</a>=
&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Why? What might h=
e find? Whats with the cryptic request?<br>
<br>
Why would Java 1.5 perform better than 1.6? It erases 20 and 40% gains?<br>
<br>
I know point 2 certainly doesn&#39;t. Cards on the table?<br>
<div class=3D"im"><br>
John Wang wrote:<br>
&gt; Hey Michael:<br>
&gt;<br>
&gt; =A0 =A0 =A0 =A0Would you mind rerunning the test you have with jdk1.5?=
<br>
&gt;<br>
&gt; =A0 =A0 =A0 =A0Also, if you would, change the comparator method to avo=
id<br>
&gt; brachning for int and string comparators, e.g.<br>
&gt;<br>
&gt;<br>
&gt; =A0 =A0 =A0 return index.order[i.doc] - index.order[j.doc];<br>
&gt;<br>
&gt;<br>
&gt; Thanks<br>
&gt;<br>
&gt;<br>
&gt; -John<br>
&gt;<br>
&gt;<br>
&gt; On Thu, Oct 22, 2009 at 2:38 AM, Michael McCandless<br>
</div><div class=3D"im">&gt; &lt;<a href=3D"mailto:lucene@mikemccandless.co=
m">lucene@mikemccandless.com</a> &lt;mailto:<a href=3D"mailto:lucene@mikemc=
candless.com">lucene@mikemccandless.com</a>&gt;&gt; wrote:<br>
&gt;<br>
&gt; =A0 =A0 On Thu, Oct 22, 2009 at 2:17 AM, John Wang &lt;<a href=3D"mail=
to:john.wang@gmail.com">john.wang@gmail.com</a><br>
</div><div><div></div><div class=3D"h5">&gt; =A0 =A0 &lt;mailto:<a href=3D"=
mailto:john.wang@gmail.com">john.wang@gmail.com</a>&gt;&gt; wrote:<br>
&gt;<br>
&gt; =A0 =A0 &gt; =A0 =A0 =A0I have been playing with the patch, and I thin=
k I have some<br>
&gt; =A0 =A0 information<br>
&gt; =A0 =A0 &gt; that you might like.<br>
&gt; =A0 =A0 &gt; =A0 =A0 =A0Let me spend sometime and gather some more num=
bers and<br>
&gt; =A0 =A0 update in jira.<br>
&gt;<br>
&gt; =A0 =A0 Excellent!<br>
&gt;<br>
&gt; =A0 =A0 &gt; =A0 =A0 =A0say bottom has ords 23, 45, 76, each correspon=
ding to a<br>
&gt; =A0 =A0 string. When<br>
&gt; =A0 =A0 &gt; moving to the next segment, you need to make bottom to ha=
ve ords<br>
&gt; =A0 =A0 that can be<br>
&gt; =A0 =A0 &gt; comparable to other docs in this new segment, so you woul=
d need<br>
&gt; =A0 =A0 to find the<br>
&gt; =A0 =A0 &gt; new ords for the values in 23,45 and 76, don&#39;t you? T=
o find it,<br>
&gt; =A0 =A0 assuming the<br>
&gt; =A0 =A0 &gt; values are s1,s2,s3, you would do a bin. search on the ne=
w val<br>
&gt; =A0 =A0 array, and<br>
&gt; =A0 =A0 &gt; find index for s1,s2,s3.<br>
&gt;<br>
&gt; =A0 =A0 It&#39;s that inversion (from ord-&gt;Comparable in first seg,=
 and<br>
&gt; =A0 =A0 Comparable-&gt;ord in second seg) that I&#39;m trying to avoid=
 (w/ this new<br>
&gt; =A0 =A0 proposal).<br>
&gt;<br>
&gt; =A0 =A0 &gt; Which is 3 bin searches per convert, I am not sure<br>
&gt; =A0 =A0 &gt; how you can short circuit it. Are you suggesting we call<=
br>
&gt; =A0 =A0 Comparable on<br>
&gt; =A0 =A0 &gt; compareBottom until some doc beats it?<br>
&gt;<br>
&gt; =A0 =A0 I&#39;m saying on seg transition you indeed get the Comparable=
 for current<br>
&gt; =A0 =A0 bottom, but, don&#39;t attempt to invert it. =A0Instead, as se=
g 2 finds a<br>
&gt; =A0 =A0 hit, you get that hit&#39;s Comparables and compare to bottom.=
 =A0If it<br>
&gt; =A0 =A0 beats bottom, it goes into the queue. =A0If it does not, you u=
se the ord<br>
&gt; =A0 =A0 (in seg 2&#39;s ord space) to &quot;learn&quot; a bottom in th=
e ord space of seg 2.<br>
&gt;<br>
&gt; =A0 =A0 &gt; That would hurt performance I lot though, no?<br>
&gt;<br>
&gt; =A0 =A0 Yeah I think likely it would, since we&#39;re talking about a =
binary<br>
&gt; =A0 =A0 search on transition VS having to do possibly many<br>
&gt; =A0 =A0 upgrade-to-Comparable and compare-Comparabls to slowly learn t=
he<br>
&gt; =A0 =A0 equivalent ord in the new segment. =A0I was proposing it for c=
ases where<br>
&gt; =A0 =A0 inversion is very difficult. =A0But realistically, since you m=
ust keep<br>
&gt; =A0 =A0 around the ful ord -&gt; Comparable for every segment anyway (=
in order to<br>
&gt; =A0 =A0 merge in the end), inversion shouldn&#39;t ever actually be &q=
uot;difficult&quot; --<br>
&gt; =A0 =A0 it&#39;d just be a binary search on presumably in-RAM storage.=
<br>
&gt;<br>
&gt; =A0 =A0 Mike<br>
&gt;<br>
&gt; =A0 =A0 --------------------------------------------------------------=
-------<br>
&gt; =A0 =A0 To unsubscribe, e-mail: <a href=3D"mailto:java-dev-unsubscribe=
@lucene.apache.org">java-dev-unsubscribe@lucene.apache.org</a><br>
</div></div>&gt; =A0 =A0 &lt;mailto:<a href=3D"mailto:java-dev-unsubscribe@=
lucene.apache.org">java-dev-unsubscribe@lucene.apache.org</a>&gt;<br>
<div class=3D"im">&gt; =A0 =A0 For additional commands, e-mail: <a href=3D"=
mailto:java-dev-help@lucene.apache.org">java-dev-help@lucene.apache.org</a>=
<br>
</div>&gt; =A0 =A0 &lt;mailto:<a href=3D"mailto:java-dev-help@lucene.apache=
.org">java-dev-help@lucene.apache.org</a>&gt;<br>
<div class=3D"im">&gt;<br>
&gt;<br>
<br>
<br>
--<br>
- Mark<br>
<br>
<a href=3D"http://www.lucidimagination.com" target=3D"_blank">http://www.lu=
cidimagination.com</a><br>
<br>
<br>
<br>
<br>
---------------------------------------------------------------------<br>
</div><div><div></div><div class=3D"h5">To unsubscribe, e-mail: <a href=3D"=
mailto:java-dev-unsubscribe@lucene.apache.org">java-dev-unsubscribe@lucene.=
apache.org</a><br>
For additional commands, e-mail: <a href=3D"mailto:java-dev-help@lucene.apa=
che.org">java-dev-help@lucene.apache.org</a><br>
<br>
</div></div></blockquote></div><br>

--0016e6509768f2afdf0476910f22--