hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Suggested and max number of CFs per table
Date Thu, 17 Mar 2011 06:30:14 GMT

My Q is around the suggested or maximum number of CFs per table (see 
http://hbase.apache.org/book/schema.html#number.of.cfs )

Consider the following use-case.
* A multi-tenant system.
* All tenants write data to the same table.
* Tenants have different data retention policies.

For the above use case I thought one could then just have different CFs with 
different TTLs because Stack suggested relying on HBase's ability to purge old 
rows by applying CF-specific TTLs: http://search-hadoop.com/m/VAeb52cvWHV.  
These CFs would have the same set of columns, just different TTLs.  Then tenants 
who want to keep only last 1 month's worth of data go to the CF where TTL=1 
month, tenants who want to keep last 6 months of data go to CF where TTL=6 
months, and so on.  However, tenants are not going to be evenly distributed - 
there will be more tenants with shorter data retention periods, which means the 
CFs where these tenants have their data will grow faster.

If I'm reading http://hbase.apache.org/book/schema.html#number.of.cfs correctly, 
the advice is not to have more than 2-3 CFs per table?
And what happens if I have say 6 CFs per table?

Again if I read the above page correctly, the problem is that uneven data 
distribution will mean that whenever 1 of my CFs needs to be flushed, the 
remaining 5 CFs will also get flushed at the same time, and this may (or will?) 
trigger compaction for all CFs' files creating a sudden IO hit?

Is there a good solution for this problem?
Should one then have 6 different tables, each with just 1 CF instead of having 1 
table with 6 CFs?

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

View raw message