hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject DiffKeyDeltaEncoder not fully efficient?
Date Sun, 18 Sep 2016 16:37:18 GMT

Reading the key encoder code and noticed this in the DiffKeyDeltaEncoder:

if ((flag & FLAG_TIMESTAMP_IS_DIFF) == 0) {
  ByteBufferUtils.putLong(out, timestamp, timestampFitsInBytes);
} else {
  ByteBufferUtils.putLong(out, diffTimestamp, diffTimestampFitsInBytes);

So if the timestamp is the same as the one from the previous cell, we
store it again in its full fidelity? This makes no sense as we omit
otherwise all identical fields. I would assume this is a mistake?

In the decoding we do this:

// handle timestamp
int timestampFitsInBytes =
long timestamp = ByteBufferUtils.readLong(source, timestampFitsInBytes);
if ((flag & FLAG_TIMESTAMP_SIGN) != 0) {
  timestamp = -timestamp;
if ((flag & FLAG_TIMESTAMP_IS_DIFF) != 0) {
  timestamp = state.timestamp - timestamp;

This could be changed to simply use the state.timestamp if the value
is the same, no? Right now we would add six bytes for the cell
timestamp when they are identical.

If they are not, we store delta time, so I guess in practice this does
not happen too often, that we have the same time, unless they are
client supplied, say for a row to emulate transactions.

Should I open a ticket?


View raw message