hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: 1 vs. N CFs, dense vs. sparse CFs, flushing
Date Thu, 08 Jan 2015 12:26:51 GMT

You have two issues. 

1) Physical structure and organization.
2) Logical organization and data usage. 

This goes to the question of your data access pattern and use case. 

The best example of how to use Column Families that I can think of is an order entry system.

Here you would have something like 4-5 CF. (Order, Pick Slips, shipping, Invoice, metadata??)

Note that while there is some overlap of the data between CFs, it allows for querying only
one CF to be queried… maybe 2 if you’re accessing the metadata and its stored separately.


I’m sure that there are other models that could be used as an example, but this is one that
any classically trained database developer would understand. 
(Reservation Systems, Medical Billing, … could also be used.) 

So, while the physical issues of HBase Managing N CFs per table, you still have to deal with
the design issue on when to us a CF. 
One of the first and most common mistake is to think about HBase in terms of a Relational
Database. Its not. Thinking of CFs as analogous to tables in the relational model will kill
your performance. 

Please understand that Otis’ question raises both issues (physical design and logical design).

The answer to Otis’ question, it depends… 
You have a couple of factors and you need to approach this on a case by case basis. 

Please refrain from blogging about it until you understand the overall issue better. 

But hey! What do I know?  ;-) 


On Jan 7, 2015, at 10:42 PM, Otis Gospodnetic <otis.gospodnetic@gmail.com> wrote:

> Thanks Ted!
> So with HBASE-10201 in place, would N sparsely populated CFs with the same
> key structure ever be a better choice than a single densely populated CF
> with the same key structure?
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> On Wed, Jan 7, 2015 at 12:31 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> Please see HBASE-10201 which would come in 1.1.0 release.
>> Cheers
>> On Wed, Jan 7, 2015 at 9:10 AM, Otis Gospodnetic <
>> otis.gospodnetic@gmail.com
>>> wrote:
>>> Hi,
>>> I recently came across this good thread about 1 vs. N ColumnFamilies, the
>>> max recommended number of CFs, dense vs. sparse structure, etc. --
>>> http://search-hadoop.com/m/TozMw1jqh262
>>> This thread is from 2013. Even though people say HBase should handle more
>>> than 3 CFs, the docs still recommend to stick to 2-3 CFs.  Is that still
>>> the case?
>>> See http://hbase.apache.org/book.html#number.of.cfs
>>> Also, the thread talks about lumpy CFs and the fact that all CFs would
>> have
>>> to be flushed whenever any one of them triggers compaction..... but I
>>> remember something being changed in this space a while back.  No?
>>> Thanks,
>>> Otis
>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/

View raw message