hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Schäfer <syrious3...@yahoo.de>
Subject Re: Schema Design - Move second column family to new table
Date Wed, 22 Aug 2012 08:05:30 GMT
Just a short call back.

As noticed I will now use two column families (instead of an addional table) to achieve row
level atomicity.

Because CF1 has a much higher cardinality than CF2, flushes will likely be always triggered
by CF1's memstore reaching configured flush size.
Thus, also CF2 will be flushed resulting in very small HFiles because on 1000 set rows of
CF1 comes ~1 row of CF2.


Has anyone experiences if that will become a performance problem when doing a scan restricted
on CF2 (means checking many small HFiles) assuming bloom filters are applied?



regards,
Christian


----- Ursprüngliche Message -----
Von: Christian Schäfer <syrious3000@yahoo.de>
An: "user@hbase.apache.org" <user@hbase.apache.org>
CC: 
Gesendet: 22:54 Montag, 20.August 2012
Betreff: RE: Schema Design - Move second column family to new table

 Thanks Pranav for the Schema Design resource...will check this soon.


&

Thanks Ian for your thoughts..you're right that the point about transactions is really important.

On the other hand due to per-region compaction, big scans over CF2 (= CF with only few rows
set) would result in several disk seeks.

So I still have to find out if big scans over CF2 are really as important as I currently expect.
Whereas I guess that (in our use case) transaction security is more important than speed of
analytics


regards
Chris.



________________________________
Von: Ian Varley <ivarley@salesforce.com>
An: "user@hbase.apache.org" <user@hbase.apache.org> 
CC: Christian Schäfer <syrious3000@yahoo.de> 
Gesendet: 16:37 Montag, 20.August 2012
Betreff: Re: Schema Design - Move second column family to new table

Christian,

Column families are really more "within" rows, not the other way around (they're really just
a way to physically partition sets of columns in a table). In your example, then, it's more
correct to say that table1 has millions / billions of rows, but only hundreds of them have
any columns in CF2. I'm not exactly sure how much of a penalty that 2nd column family imposes
in this case--if you don't include it as a part of your scans / gets, then you won't pay any
penalty at read time; but if you're reading from both "just in case" the row has data there,
you'll always take a hit. I think the same goes for writes. (Question for the list: does adding
a column family that you *never* use impose any penalties?)

The downside to moving it to another table is, writes will no longer be transactionally protected
(i.e. if you're trying to write to both, it could fail after one and before the other). Conversely,
if you put them as column families in the same row, writes to a single row are transactional.
You may or may not care about that.

So, putting the lower cardinality data in another table with the same row key might be performance
win, or it might not, depending on your read & write patterns. Try it both ways and compare,
and let us know what you find.

Ian

On Aug 20, 2012, at 7:25 AM, Pranav Modi wrote:

This might be useful -
http://java.dzone.com/videos/hbase-schema-design-things-you

On Mon, Aug 20, 2012 at 5:17 PM, Christian Schäfer <syrious3000@yahoo.de>wrote:

Currently I'm about to design HBase tables.

In my case there is table1 with CF1 holding millions/billions of rows and
CF2 with hundreds of rows.
Read use cases include reading both CF data by key or reading only one CF.

Referring to http://hbase.apache.org/book/number.of.cfs.html

Due to the cardinality difference I would change the schema design by
putting CF2 in an extra table (table 2), right?
So after that there are table1 and table2 each with one CF with the same
row key.
Any doubting about that?

Can
anyone recommend resources about HBase-Schema-Design where HBase
Schema Design is explained on different use cases
beyond "HBase- Definitive Guide" and the HBase online reference?

regards,
Christian

Mime
View raw message