hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srikanth P. Shreenivas" <Srikanth_Shreeni...@mindtree.com>
Subject RE: Tall-Narrow vs. Flat-Wide Tables
Date Fri, 02 Sep 2011 12:32:19 GMT
Thanks Dave.

In that case, I guess a correction needs to be done in HBase Definitive Guide's first chapter
(http://ofps.oreilly.com/titles/9781449396107/intro.html), where it states:
----
As opposed to the limit on column families there is no such thing for the number of columns:
you could have millions of columns in a particular column family. There is also no type nor
length boundary on the column values.
----

If below example of Email schema design is an example of bad schema design, not because of
query/access pattern but because of the issue it can create for region splits, then,  the
above excerpt from the book should have a fine print ;-)


Regards,
Srikanth




-----Original Message-----
From: Buttler, David [mailto:buttler1@llnl.gov] 
Sent: Friday, September 02, 2011 2:08 AM
To: user@hbase.apache.org
Subject: RE: Tall-Narrow vs. Flat-Wide Tables

The "HBase: The Definitive Guide" answer seems pretty, um, definitive to me.  The only reason
I would even consider going against that advice is if I had solid knowledge that it was impossible
for a user to have more than 100,000 emails.  But even then it seems like a difficult design
decision to justify.  How does that design help you do something?

Dave

-----Original Message-----
From: Srikanth P. Shreenivas [mailto:Srikanth_Shreenivas@mindtree.com] 
Sent: Thursday, September 01, 2011 11:53 AM
To: user@hbase.apache.org
Subject: Tall-Narrow vs. Flat-Wide Tables

Hi,

HBase: The Definitive Guide book's chapter 9 talks about Tall-Narrow vs Flat-wide tables.
(http://ofps.oreilly.com/titles/9781449396107/advanced.html)

It seems to propose that Tall-Narrow tables (more rows, less columns) is better design.  One
of the issue it talks about with "Flat-wide" tables (less rows and more columns) is
...
In addition, HBase can only split at row boundaries, which also enforces the recommendation
to go with tall-narrow tables. Imagine you have all emails of a user in a single row. This
will work for the majority of users, but there will be outliers that will have magnitudes
of emails more in their inbox. So much so that a single row could outgrow the maximum file/region
size and work against the region split facility.
...

So, my query is that is it a bad idea to have a table as given in above example wherein emails
are stored by adding columns.   I seem to have a similar table in my application, wherein
I have a region size of 1GB and cell value of 10KB.  So, will I run into region-split issue
mentioned above after 100000 (1GB / 10KB = 100000)  columns.

Regards,
Srikanth

________________________________

http://www.mindtree.com/email/disclaimer.html

Mime
View raw message