hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Jain <sachinjain...@gmail.com>
Subject Re: Downsides of having large number of versions in hbase
Date Thu, 01 Dec 2016 06:43:10 GMT
I found the following snippet on HBase book [0]

It is not recommended setting the number of max versions to an exceedingly
high level (e.g., hundreds or more) unless those old values are very dear
to you because this will greatly increase StoreFile size.

Does this validate above hypothesis #2.

[0]: http://hbase.apache.org/book.html#schema.versions

On Tue, Nov 29, 2016 at 4:07 PM, Sachin Jain <sachinjain024@gmail.com>

> Hi,
> I am curious to understand the impact of having large number of versions
> in HBase. Suppose I want to maintain previous 100 versions for a row/cell.
> My thoughts are:-
> Having large number of versions means more number of HFiles
> More number of HFiles can increase lookup time of a rowKey.
>   Hypothesis 1 : Region server has to check each HFile for the presence of
> that rowKey and then based on timestamp it will accumulate the latest
> version.
>   Hypothesis 2 : Region server may not scan each HFile. Based on last
> creation date of HFile,as soon as it gets rowKey in the last created HFile
> it will not scan HFiles further. Because we are interested in latest
> version only and we have got in the file recently created.
> Want to confirm what is true among 1 and 2.
> Similarly, large number of versions can also degrade the performance of
> full scan for joins etc.
> Thanks
> -Sachin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message