hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patr...@cloudera.com>
Subject Re: Suggested and max number of CFs per table
Date Thu, 17 Mar 2011 15:26:35 GMT

Perhaps your biggest issue will be the need to disable the table to add a
new CF. So effectively you need to bring down the application to move in a
new tenant.

Another thing with multiple CFs is that if one CF tends to get
disproportionally more data, you will get a lot of region splitting, and the
other CFs will have HFiles for a region that are very small.

I think the only reasonable use of CFs is if you really need row-level
atomicity across CFs. Otherwise just use multiple tables.

On Thu, Mar 17, 2011 at 2:30 AM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hi,
> My Q is around the suggested or maximum number of CFs per table (see
> http://hbase.apache.org/book/schema.html#number.of.cfs )
> Consider the following use-case.
> * A multi-tenant system.
> * All tenants write data to the same table.
> * Tenants have different data retention policies.
> For the above use case I thought one could then just have different CFs
> with
> different TTLs because Stack suggested relying on HBase's ability to purge
> old
> rows by applying CF-specific TTLs: http://search-hadoop.com/m/VAeb52cvWHV.
> These CFs would have the same set of columns, just different TTLs.  Then
> tenants
> who want to keep only last 1 month's worth of data go to the CF where TTL=1
> month, tenants who want to keep last 6 months of data go to CF where TTL=6
> months, and so on.  However, tenants are not going to be evenly distributed
> -
> there will be more tenants with shorter data retention periods, which means
> the
> CFs where these tenants have their data will grow faster.
> If I'm reading http://hbase.apache.org/book/schema.html#number.of.cfscorrectly,
> the advice is not to have more than 2-3 CFs per table?
> And what happens if I have say 6 CFs per table?
> Again if I read the above page correctly, the problem is that uneven data
> distribution will mean that whenever 1 of my CFs needs to be flushed, the
> remaining 5 CFs will also get flushed at the same time, and this may (or
> will?)
> trigger compaction for all CFs' files creating a sudden IO hit?
> Is there a good solution for this problem?
> Should one then have 6 different tables, each with just 1 CF instead of
> having 1
> table with 6 CFs?
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message