hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Дмитрий <dimka-...@i.ua>
Subject FastDiffDeltaEncoder improvements
Date Mon, 12 Sep 2016 07:46:41 GMT
Hi all, 

I would like to discuss available implementations of data block encoding in HBase and how
we can improve them. 
The most interesting for me is FastDiffDeltaEncoder because it encodes not only keys but also
anothers fields 
like timestamp, type, keyLen, etc. Also it removes duplicated values and it is the most controversial
feature 
as for me. Look at following image: 

[IMG]http://i68.tinypic.com/8z2wzn.png[/IMG] 

This is an example of small table with row keys: Row-1, Row-2, Row-3 and columns Column-A,
Column-B, Column-C. 
DataBlockEncoder encodes cells ordered by keys. Each key consists of RowKey, Family and Qualifier.
That's why 
we will encode cells in order which is displayed by blue line in the image. 

FastDiffDeltaEncoder calculates difference between two serial cells. In this way duplicated
values in Column-A 
will not be removed. The only case when it works it is in single column tables. 

So, my suggestion is to detect duplicates in columns, not only in neighboring cells. Also
I've heard an idea 
not just to remove duplicated values, but to calculate prefix difference between them, like
for keys. 

To implement this we have to keep previous value for each column. The most efficient way in
my opinion is to 
keep them in HashMap using ByteArrayWrapper for keys. Size of this map will be the same as
count of unique 
columns in the encoding block. 

It looks very easy to implement this but I guess there must be some hidden obstacles, because
this has not 
implemented yet. 

What do you think about the idea? Is there more efficient way (by CPU/Memory) to keep previous
values? 
Should I try to implement prefix delta encoding for values?

-- реклама -----------------------------------------------------------
Огромный выбор и скидки на телевизоры на Palladium.ua!
http://goo.gl/HBFW3x
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message