hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hbase/DataModel" by JeanDanielCryans
Date Sun, 13 Jul 2008 16:24:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by JeanDanielCryans:
http://wiki.apache.org/hadoop/Hbase/DataModel

------------------------------------------------------------------------------
  [[Anchor(overview)]]
  = Overview =
  
- To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], Map<byte[],
Map<long, byte[]>>>>. The first Map maps row keys to their ''column families''.
The second maps column families to their ''column keys''. The third one maps column keys to
their ''timestamps''. Finally, the last one maps the timestamps to a single value. The keys
are typically strings, the timestamp is a long and the value is an uninterpreted array of
bytes. The column key is always preceded by its family and is represented like this: ''family:key''.
Since a family maps to another map, this means that a single column family can contain a theoretical
infinity of column keys. So, to retrieve a single value, the user has to do a ''get'' using
three keys:
+ To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], Map<byte[],
Map<Long, byte[]>>>>. The first Map maps row keys to their ''column families''.
The second maps column families to their ''column keys''. The third one maps column keys to
their ''timestamps''. Finally, the last one maps the timestamps to a single value. The keys
are typically strings, the timestamp is a long and the value is an uninterpreted array of
bytes. The column key is always preceded by its family and is represented like this: ''family:key''.
Since a family maps to another map, this means that a single column family can contain a theoretical
infinity of column keys. So, to retrieve a single value, the user has to do a ''get'' using
three keys:
  
  row key+column key+timestamp -> value
  
@@ -54, +54 @@

  The following attributes can be specified or each families:
  
  Implemented
- 
   * Compression
    * Record: means that each exact values found at a rowkey+columnkey+timestamp will be compressed
independently.
    * Block: means that blocks in HDFS are compressed. A block may contain multiple records
if they are shorter than one HDFS block or may only contain part of a record if the record
is longer than a HDFS block.
@@ -63, +62 @@

    * Time to live: versions older than specified time will be garbage collected.
  
  Still not implemented
- 
   * In memory: all values of that family will be kept in memory.
-  * Length: values written will not be longer than the specified number of bytes.
+  * Length: values written will not be longer than the specified number of bytes. See [https://issues.apache.org/jira/browse/HBASE-742
See HBASE-742]
  
  [[Anchor(example)]]
  = Real Life Example =
+ The following example is the same one given during HBase ETS presentation available in french
in the presentation page.
+ 
+ A good example on how to demonstrate the HBase data model is a blog because of it's simple
features and domain. Suppose the following mini-SRS:
+  * The blog entries, which consist of a title, an under title, a date, an author, a type
(or tag), a text, and comments, can be created and updated by logged in users.
+  * The users, which consist of a username, a password, and a name, can log in and log out.
+  * The comments, which consist of a title, an author, and text, can be written anonymously
by visitors as long as their identity is verified by a captcha.
  
  [[Anchor(relational)]]
  == The Source ERD ==
  
+ http://www.hadoop.ca/img/db_blog.jpg
+ 
  [[Anchor(hbaseschema)]]
  == The HBase Target Schema ==
  

Mime
View raw message