hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Jain <sachinjain...@gmail.com>
Subject Downsides of having large number of versions in hbase
Date Tue, 29 Nov 2016 10:37:00 GMT

I am curious to understand the impact of having large number of versions in
HBase. Suppose I want to maintain previous 100 versions for a row/cell.

My thoughts are:-

Having large number of versions means more number of HFiles
More number of HFiles can increase lookup time of a rowKey.

  Hypothesis 1 : Region server has to check each HFile for the presence of
that rowKey and then based on timestamp it will accumulate the latest

  Hypothesis 2 : Region server may not scan each HFile. Based on last
creation date of HFile,as soon as it gets rowKey in the last created HFile
it will not scan HFiles further. Because we are interested in latest
version only and we have got in the file recently created.

Want to confirm what is true among 1 and 2.

Similarly, large number of versions can also degrade the performance of
full scan for joins etc.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message