lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Fri, 23 Oct 2009 02:29:14 GMT
Why? What might he find? Whats with the cryptic request?

Why would Java 1.5 perform better than 1.6? It erases 20 and 40% gains?

I know point 2 certainly doesn't. Cards on the table?

John Wang wrote:
> Hey Michael:
>
>        Would you mind rerunning the test you have with jdk1.5?
>
>        Also, if you would, change the comparator method to avoid
> brachning for int and string comparators, e.g. 
>
>
>       return index.order[i.doc] - index.order[j.doc];
>
>
> Thanks
>
>
> -John
>
>
> On Thu, Oct 22, 2009 at 2:38 AM, Michael McCandless
> <lucene@mikemccandless.com <mailto:lucene@mikemccandless.com>> wrote:
>
>     On Thu, Oct 22, 2009 at 2:17 AM, John Wang <john.wang@gmail.com
>     <mailto:john.wang@gmail.com>> wrote:
>
>     >      I have been playing with the patch, and I think I have some
>     information
>     > that you might like.
>     >      Let me spend sometime and gather some more numbers and
>     update in jira.
>
>     Excellent!
>
>     >      say bottom has ords 23, 45, 76, each corresponding to a
>     string. When
>     > moving to the next segment, you need to make bottom to have ords
>     that can be
>     > comparable to other docs in this new segment, so you would need
>     to find the
>     > new ords for the values in 23,45 and 76, don't you? To find it,
>     assuming the
>     > values are s1,s2,s3, you would do a bin. search on the new val
>     array, and
>     > find index for s1,s2,s3.
>
>     It's that inversion (from ord->Comparable in first seg, and
>     Comparable->ord in second seg) that I'm trying to avoid (w/ this new
>     proposal).
>
>     > Which is 3 bin searches per convert, I am not sure
>     > how you can short circuit it. Are you suggesting we call
>     Comparable on
>     > compareBottom until some doc beats it?
>
>     I'm saying on seg transition you indeed get the Comparable for current
>     bottom, but, don't attempt to invert it.  Instead, as seg 2 finds a
>     hit, you get that hit's Comparables and compare to bottom.  If it
>     beats bottom, it goes into the queue.  If it does not, you use the ord
>     (in seg 2's ord space) to "learn" a bottom in the ord space of seg 2.
>
>     > That would hurt performance I lot though, no?
>
>     Yeah I think likely it would, since we're talking about a binary
>     search on transition VS having to do possibly many
>     upgrade-to-Comparable and compare-Comparabls to slowly learn the
>     equivalent ord in the new segment.  I was proposing it for cases where
>     inversion is very difficult.  But realistically, since you must keep
>     around the ful ord -> Comparable for every segment anyway (in order to
>     merge in the end), inversion shouldn't ever actually be "difficult" --
>     it'd just be a binary search on presumably in-RAM storage.
>
>     Mike
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>     <mailto:java-dev-unsubscribe@lucene.apache.org>
>     For additional commands, e-mail: java-dev-help@lucene.apache.org
>     <mailto:java-dev-help@lucene.apache.org>
>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message