lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: lucene 2.9 sorting algorithm
Date Fri, 23 Oct 2009 01:37:22 GMT
Hey Michael:
       Would you mind rerunning the test you have with jdk1.5?

       Also, if you would, change the comparator method to avoid brachning
for int and string comparators, e.g.


      return index.order[i.doc] - index.order[j.doc];


Thanks


-John

On Thu, Oct 22, 2009 at 2:38 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Thu, Oct 22, 2009 at 2:17 AM, John Wang <john.wang@gmail.com> wrote:
>
> >      I have been playing with the patch, and I think I have some
> information
> > that you might like.
> >      Let me spend sometime and gather some more numbers and update in
> jira.
>
> Excellent!
>
> >      say bottom has ords 23, 45, 76, each corresponding to a string. When
> > moving to the next segment, you need to make bottom to have ords that can
> be
> > comparable to other docs in this new segment, so you would need to find
> the
> > new ords for the values in 23,45 and 76, don't you? To find it, assuming
> the
> > values are s1,s2,s3, you would do a bin. search on the new val array, and
> > find index for s1,s2,s3.
>
> It's that inversion (from ord->Comparable in first seg, and
> Comparable->ord in second seg) that I'm trying to avoid (w/ this new
> proposal).
>
> > Which is 3 bin searches per convert, I am not sure
> > how you can short circuit it. Are you suggesting we call Comparable on
> > compareBottom until some doc beats it?
>
> I'm saying on seg transition you indeed get the Comparable for current
> bottom, but, don't attempt to invert it.  Instead, as seg 2 finds a
> hit, you get that hit's Comparables and compare to bottom.  If it
> beats bottom, it goes into the queue.  If it does not, you use the ord
> (in seg 2's ord space) to "learn" a bottom in the ord space of seg 2.
>
> > That would hurt performance I lot though, no?
>
> Yeah I think likely it would, since we're talking about a binary
> search on transition VS having to do possibly many
> upgrade-to-Comparable and compare-Comparabls to slowly learn the
> equivalent ord in the new segment.  I was proposing it for cases where
> inversion is very difficult.  But realistically, since you must keep
> around the ful ord -> Comparable for every segment anyway (in order to
> merge in the end), inversion shouldn't ever actually be "difficult" --
> it'd just be a binary search on presumably in-RAM storage.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message