lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: BytesRef comparable
Date Mon, 03 May 2010 10:30:52 GMT
I agree -- objects that directly impl Comparable are great.

The problem is BytesRef is not really a concrete object.  It can't
know how the terms it's representing are supposed to sort.

Yet nearly all the time this sort will be lucene's default term sort
(only custom codecs can change this), so I'm +1 on making BytesRef
sort according to that (note that this is not actually natural byte[]
order, because we must interp the UTF8 bytes as unsigned to sort in
unicode code point order).  The expert users of custom codecs that
alter their sort order can be expected to pass their own

But... we probably should do this after we switch to sorting terms by
unicode code point order?  (LUCENE-2426)


On Mon, May 3, 2010 at 6:15 AM, Shai Erera <> wrote:
> I don't know what Yonik's specific use case is, but I generally like objects
> that are Comparable, rather than passing around Comparators. For example, I
> think that if ScoreDoc was comparable, less people would need to extend PQ
> as well as ScoreDoc. They could just impl their ScoreDocExt sort logic ...
> Comparators however give you more flexibility, if e.g. you want to compare
> the same objects using different criteria.
> You can still do both - the object controls its "natural" order while
> external comparators can sort differently.
> It also depends whether you can pass the Comparator down the call stack or
> not.
> I myself am for having objects implement Comparable (when it makes sense)
> and also open up the use of Comparator if that really is needed.
> Shai
> On Mon, May 3, 2010 at 12:36 PM, Michael McCandless
> <> wrote:
>> It used to implement Comparable (hardwired to natural byte[] order),
>> but I removed it, so that all comparisons are forced to be explicit.
>> The problem is... it's dangerous to assume comparison in natural order
>> is always correct.  Eg Lucene today sorts terms using
>> UTF8SortedAsUTF16Comparator (which is not natural byte[] ordering).
>> Of course we are moving away from this, so terms will by default be
>> sorted in Unicode code point order (which matches UTF8 byte[] natural
>> order), under LUCENE-2426, but a codec could still customize the sort
>> order.
>> We could still put it back, hardwired to natural byte order?  And
>> javadoc the dangers...
>> Or I guess we could tell each BytesRef the comparator it should
>> delegate to, but that's rather inefficient.
>> Where are you needing to compare BytesRefs?  And which container won't
>> accept Compator...?
>> Mike
>> On Sun, May 2, 2010 at 1:32 PM, Yonik Seeley <>
>> wrote:
>> > Any objections to making BytesRef comparable?  It would make it much
>> > easier to use with containers that don't take comparators as
>> > parameters.
>> >
>> > -Yonik
>> > Apache Lucene Eurocon 2010
>> > 18-21 May 2010 | Prague
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail:
>> > For additional commands, e-mail:
>> >
>> >
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message