On Wed, Jan 7, 2015 at 9:27 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:
>
> Data/table layout:
> * HBase is used for storing metrics at different granularities (1min, 5
> min.... - a total of 6 different granularities)
> * It's a multi-tenant system
> * Keys are carefully crafted and include userId + number, where this number
> contains the time and the granularity
> * Everything's in 1 table and 1 CF
>
> Access:
> * We only access 1 system at a time, for a specific time range, and
> specific granularity
> * We periodically scan ALL data and delete data older than N days, where N
> varies from user to user
> * We periodically scan ALL data and merge multiple rows (of the same
> granularity) into 1
>
>
Are you having a problem Otis that you are trying to solve?
> Question:
> Would there be any advantage in having 6 tables - one for each granularity
> - instead of having everything in 1 table?
>
It could make for less rewriting of data. If all in the one table, a
compaction will rewrite all granularities. If separate tables, the coarser
granularities would change less often so would flush/compact -- be
rewritten -- less often.
You might get similar effect if you put in place a split policy that split
regions on a granularity border; e.g. have all the 1minutes in one region
and anything at a coarser range goes into a different region.
You have notions of the relative proportions of the different
granularities? (e.g. is the coarsest granularity 10% or an irrelevant
0.0001%?)
Otherwise, as @tsuna says and yeah, what @nick says regards compaction;
might be worth exploring... could save you a bunch of churn.
St.Ack
> Assume each table would still have just 1 CF and the keys would remain the
> same.
>
|