lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: lucene 2.9 sorting algorithm
Date Wed, 21 Oct 2009 10:11:03 GMT
On Tue, Oct 20, 2009 at 11:55 AM, John Wang <> wrote:

> the simpler api places less restriction on the type of custom
> sorting that can be done.

Just to verify: this is not a back-compat break, right?

Because, in 2.4, such an interesting custom sort must've been
operating at the top-level index reader level, which is easy to carry
over to 2.9 (you just rebase the docIDs).

But, of course in moving to 2.9, you would like to also switch your
custom sort to be per-segment (for faster reopen/near real-time perf),
but the new sort API makes this more difficult because it requires
that you are able to compare hits across different segments during the
search, not just at the end.

But then I don't understand the difficulty of doing that: if we had a
Collector with the MultiPQ approach, at the end during merge, you'd
also have to compare results across segments, ie, upgrade your ords to
their real values.  The MultiPQ approach does this by calling
sortValue (returns Comparable) in the end.

Putting performance aside for now... when comparing bottom, you don't
actually have to "truly invert" Comparable -> ord on segment
transition.  You could, instead, get the Comparable for each and
compare, but then note the smallest ord for the current segment that
has failed to compete, and short-ciruit the compareBottom test by
checking against that ord. That should enable carrying over the custom
sort to the single PQ API without needing invert ord->value.

We'd obviously have to test performance...

Or, we could commit the MultiPQ approach as another sorting collector?
I know it's not great having two wildly differenet sort APIs, but both
APIs seem to have their strengths in different cases.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message