lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Why Two Levels of Indirection in BytesRefHash class ?
Date Sun, 11 Dec 2016 12:04:58 GMT
Adrien Grand <jpountz@gmail.com> wrote:
> That would work if you are only interested in using BytesRefHash as a hash
> set for byte[]. However these incremental ids are useful if you want to
> associate data with each byte[]: you can create parallel arrays and use the
> ids returned by the BytesRefHash as indices in these arrays.

That could be solved by prepending the stored BytesRef with the counter value, then using
a fixed +4 delta to the offset to get the BytesRef. Same space requirements as now, but with
one less level of indirection meaning less CPU-cache invalidation.

However, this removes the nice property of providing insertion-order iterability of the DocValues
in the structure, so it would be quite a change to current code.


One optimization, while we are on the subject, is to exploit the indirection. As the bytesStarts
are monotonic incremental offsets in the ByteBlockPool, there is no need to store the length
of the BytesRefs. They can be calculated with bytesStarts[id+1] - bytesStarts[id]. This saves
1-2 bytes per entry and upholds memory locality, so it should have the same performance as
now (needs to be tested of course).

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message