hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Schäfer <syrious3...@yahoo.de>
Subject RE: Schema Design - Move second column family to new table
Date Mon, 20 Aug 2012 20:54:56 GMT
 Thanks Pranav for the Schema Design resource...will check this soon.


Thanks Ian for your thoughts..you're right that the point about transactions is really important.

On the other hand due to per-region compaction, big scans over CF2 (= CF with only few rows
set) would result in several disk seeks.

So I still have to find out if big scans over CF2 are really as important as I currently expect.
Whereas I guess that (in our use case) transaction security is more important than speed of


Von: Ian Varley <ivarley@salesforce.com>
An: "user@hbase.apache.org" <user@hbase.apache.org> 
CC: Christian Schäfer <syrious3000@yahoo.de> 
Gesendet: 16:37 Montag, 20.August 2012
Betreff: Re: Schema Design - Move second column family to new table


Column families are really more "within" rows, not the other way around (they're really just
a way to physically partition sets of columns in a table). In your example, then, it's more
correct to say that table1 has millions / billions of rows, but only hundreds of them have
any columns in CF2. I'm not exactly sure how much of a penalty that 2nd column family imposes
in this case--if you don't include it as a part of your scans / gets, then you won't pay any
penalty at read time; but if you're reading from both "just in case" the row has data there,
you'll always take a hit. I think the same goes for writes. (Question for the list: does adding
a column family that you *never* use impose any penalties?)

The downside to moving it to another table is, writes will no longer be transactionally protected
(i.e. if you're trying to write to both, it could fail after one and before the other). Conversely,
if you put them as column families in the same row, writes to a single row are transactional.
You may or may not care about that.

So, putting the lower cardinality data in another table with the same row key might be performance
win, or it might not, depending on your read & write patterns. Try it both ways and compare,
and let us know what you find.


On Aug 20, 2012, at 7:25 AM, Pranav Modi wrote:

This might be useful -

On Mon, Aug 20, 2012 at 5:17 PM, Christian Schäfer <syrious3000@yahoo.de>wrote:

Currently I'm about to design HBase tables.

In my case there is table1 with CF1 holding millions/billions of rows and
CF2 with hundreds of rows.
Read use cases include reading both CF data by key or reading only one CF.

Referring to http://hbase.apache.org/book/number.of.cfs.html

Due to the cardinality difference I would change the schema design by
putting CF2 in an extra table (table 2), right?
So after that there are table1 and table2 each with one CF with the same
row key.
Any doubting about that?

anyone recommend resources about HBase-Schema-Design where HBase
Schema Design is explained on different use cases
beyond "HBase- Definitive Guide" and the HBase online reference?


View raw message