lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rose, Stuart J" <Stuart.R...@pnnl.gov>
Subject RE: BytesRef equals() method
Date Tue, 21 Jan 2014 23:34:57 GMT
I agree that comparing the BytesRef lengths in an equals() method seems counter to the purpose
of having a BytesRef class. 

I'd recommend taking a look at the BytesRefHash which maps BytesRef objects to unique ids
as it 'may' be more efficient than converting to Strings. 

Stuart


-----Original Message-----
From: Yann-Erwan Perio [mailto:ye.perio@gmail.com] 
Sent: Tuesday, January 21, 2014 7:33 AM
To: java-user@lucene.apache.org
Subject: BytesRef equals() method

Hello,

I have been working a bit with BytesRef recently, and I wonder whether the content of the
equals() method, and more specifically the content of the bytesEquals(BytesRef other) method,
is the intended one.

Here is my use case. I work with Lucene 4.6.0. During indexing, using a custom tokenizer,
I have added some payloads onto some tokens. Using an extension of the Default Similarity,
I was then able to retrieve these payloads, passing them to a collector of mine, so as to
perform aggregation calculations. It occurred to me that the BytesRef retrieved were not exactly
the same as the indexed - namely their real content was the same, but their offsets would
differ.

I was made aware of this because I used a Map<BytesRef, ...> in the collector, and the
map would sometimes give inconsistent results.
Checking out the source code, the hashcode() method looks valid to me, but the bytesEquals()
method looks strange - because prior to comparing the real value of the BytesRef, it checks
their lengths - and AIUI these may differ, even though BytesRef are logically equal.

I am not familiar at all with the internals of Lucene (this includes the BytesRef mechanics),
so I may be completely wrong here. FWIW, I solved my problem by creating fresh BytesRef from
the ones sent by the similarity, using the copyBytes method. I could also have used the string
representation of the BytesRef, but this appears to be slower than copying the bytes, by a
magnitude of about 2.5.

Kind regards.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message