hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject RE: Question from HBase book: "HBase currently does not do well with anything about two or three column families"
Date Thu, 02 Jun 2011 22:06:10 GMT
Re:  " Is that still considered current?  Do folks on the list generally agree with that guideline?"

Yes and yes.  HBase runs better with fewer CFs.



-----Original Message-----
From: Leif Wickland [mailto:leifwickland@gmail.com] 
Sent: Thursday, June 02, 2011 5:41 PM
To: user@hbase.apache.org
Subject: Question from HBase book: "HBase currently does not do well with anything about two
or three column families"

I was reading through the HBase book and came across the following in *6.2. On the number
of column families.<http://hbase.apache.org/book.html#number.of.cfs>
*
*
*

*"HBase currently does not do well with anything about two or three column families so keep
the number of column families in your schema low.
Currently, flushing and compactions are done on a per Region basis so if one column family
is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed
though the amount of data they carry is small. Compaction is currently triggered by the total
number of files under a column family. Its not size based. When many column families the flushing
and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
changing flushing and compaction to work on a per column family basis).*

*Try to make do with one column famliy if you can in your schemas. Only introduce a second
and third column family in the case where data access is usually column scoped; i.e. you query
one column family or the other but usually not both at the one time."*

Is that still considered current?  Do folks on the list generally agree with that guideline?

The reason that I ask is that I'm designing a data model which currently has five column families.
 I expect each of those column families to have divergent read and write patterns.  Do you
think I should look for ways to reduce the number of CFs?

Thanks,

Leif Wickland
Mime
View raw message