hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leif Wickland <leifwickl...@gmail.com>
Subject Question from HBase book: "HBase currently does not do well with anything about two or three column families"
Date Thu, 02 Jun 2011 21:40:35 GMT
I was reading through the HBase book and came across the following in *6.2. On
the number of column families.<http://hbase.apache.org/book.html#number.of.cfs>

*"HBase currently does not do well with anything about two or three column
families so keep the number of column families in your schema low.
Currently, flushing and compactions are done on a per Region basis so if one
column family is carrying the bulk of the data bringing on flushes, the
adjacent families will also be flushed though the amount of data they carry
is small. Compaction is currently triggered by the total number of files
under a column family. Its not size based. When many column families the
flushing and compaction interaction can make for a bunch of needless i/o
loading (To be addressed by changing flushing and compaction to work on a
per column family basis).*

*Try to make do with one column famliy if you can in your schemas. Only
introduce a second and third column family in the case where data access is
usually column scoped; i.e. you query one column family or the other but
usually not both at the one time."*

Is that still considered current?  Do folks on the list generally agree with
that guideline?

The reason that I ask is that I'm designing a data model which currently has
five column families.  I expect each of those column families to have
divergent read and write patterns.  Do you think I should look for ways to
reduce the number of CFs?


Leif Wickland

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message