lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rose, Stuart J" <>
Subject RE: BytesRef equals() method
Date Tue, 21 Jan 2014 23:34:57 GMT
I agree that comparing the BytesRef lengths in an equals() method seems counter to the purpose
of having a BytesRef class. 

I'd recommend taking a look at the BytesRefHash which maps BytesRef objects to unique ids
as it 'may' be more efficient than converting to Strings. 


-----Original Message-----
From: Yann-Erwan Perio [] 
Sent: Tuesday, January 21, 2014 7:33 AM
Subject: BytesRef equals() method


I have been working a bit with BytesRef recently, and I wonder whether the content of the
equals() method, and more specifically the content of the bytesEquals(BytesRef other) method,
is the intended one.

Here is my use case. I work with Lucene 4.6.0. During indexing, using a custom tokenizer,
I have added some payloads onto some tokens. Using an extension of the Default Similarity,
I was then able to retrieve these payloads, passing them to a collector of mine, so as to
perform aggregation calculations. It occurred to me that the BytesRef retrieved were not exactly
the same as the indexed - namely their real content was the same, but their offsets would

I was made aware of this because I used a Map<BytesRef, ...> in the collector, and the
map would sometimes give inconsistent results.
Checking out the source code, the hashcode() method looks valid to me, but the bytesEquals()
method looks strange - because prior to comparing the real value of the BytesRef, it checks
their lengths - and AIUI these may differ, even though BytesRef are logically equal.

I am not familiar at all with the internals of Lucene (this includes the BytesRef mechanics),
so I may be completely wrong here. FWIW, I solved my problem by creating fresh BytesRef from
the ones sent by the similarity, using the copyBytes method. I could also have used the string
representation of the BytesRef, but this appears to be slower than copying the bytes, by a
magnitude of about 2.5.

Kind regards.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message