hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Koch <ogd...@googlemail.com>
Subject Custom versioning best practices
Date Thu, 22 Nov 2012 13:55:58 GMT

I was thinking of using versions with custom timestamps to store the
evolution of a column value - as opposed to creating several (time_t,
value_at_time_t) qualifier-value pairs. The value to be stored is a single
integer. Fast ad-hoc retrieval of multiple versions based on a row key +
filter [1] (i.e through a web service) is important, the number of row keys
will be between 10^6 and 10^9.

a) If the number of versions (timestamps) is moderate, can I expect
read/filtering performance to be better than when using multiple
qualifier/value pairs?
b) For a larger number of versions, say 365, what if any precautions should
I take with respect to the HBase/table setup.

I looked around a bit and found the following:

The documentation [2] mentions that the maximum number of versions should
not be too high ("in the hundreds"). The HBase o'Reilly book [3] on the
other hand mentions that Facebook use(d) versions to store inbox messages
in order. Clearly, the number of messages may grow quite large (>> 100). Is
[1] still valid with more recent versions of HBase?

Thank you,


[2] http://hbase.apache.org/book/schema.versions.html
[3] 1st edition, page 384

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message