hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/DataModel" by JeanDanielCryans
Date Tue, 13 Jan 2009 16:14:08 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by JeanDanielCryans:

The comment on the change is:
Added the HBase schema

  == The HBase Target Schema ==
+ A first solution could be :
+ ||Table||Row Key||Family||Attributs||
+ ||blogtable||TTYYYYMMDDHHmmss||info:||Always contains the column keys author,title,under_title.
Should be IN-MEMORY and have a 1 version||
+ ||     || ||text:||No column key. 3 versions||
+ ||     || ||comment_title:||Column keys are written like YYYMMDDHHmmss. Should be IN-MEMORY
and have a 1 version||
+ ||     || ||comment_author:||Same keys. 1 version||
+ ||     || ||comment_text:||Same keys. 1 version||
+ ||usertable||login_name||info:||Always contains the column keys password and name. 1 version||
+ The row key for blogtable is a concatenation of it's type (shortened to 2 letters) and it's
timestamp. This way, the rows will be gathered first by type and then by date throughout the
cluster. It means more chances of hitting a single region to fetch the needed data. Also you
can see that the one-to-many relationship between BLOGENTRY and COMMENT is handled by putting
each attributes of the comments as a family in blogentry and by using it a date as a column
key, all comments are already sorted.
+ One advantage of this design is that when you show the "front page" of your blog, you only
have to fetch the family "info:" from blogtable. When you show an actual blog entry, you fetch
a whole row. Another advantage is that by using timestamps in the row key, your scanner will
fetch sequential rows if you want to show, for example, the entries from the last month.

View raw message