hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/HBaseIntegration" by JohnSichi
Date Fri, 04 Jun 2010 21:22:51 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/HBaseIntegration" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/HBaseIntegration?action=diff&rev1=28&rev2=29

--------------------------------------------------

  {{{
  CREATE TABLE hbase_table_1(key int, value string) 
  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
- WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val")
+ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
  TBLPROPERTIES ("hbase.table.name" = "xyz");
  }}}
  
@@ -140, +140 @@

   * for each Hive column, the table creator must specify a corresponding entry in the comma-delimited
{{{hbase.columns.mapping}}} string (so for a Hive table with n columns, the string should
have n entries); whitespace should '''not''' be used in between entries since these will be
interperted as part of the column name, which is almost certainly not what you want
   * a mapping entry must be either {{{:key}}} or of the form {{{column-family-name:[column-name]}}}
   * there must be exactly one {{{:key}}} mapping (we don't support compound keys yet)
-  ** note that before HIVE-1228, {{{:key}}} was not supported, and the first Hive column
implicitly mapped to the key; as of HIVE-1228, it is now strongly recommended that you always
specify the key explictly; we will drop support for implicit key mapping in the future
+  * (note that before HIVE-1228, {{{:key}}} was not supported, and the first Hive column
implicitly mapped to the key; as of HIVE-1228, it is now strongly recommended that you always
specify the key explictly; we will drop support for implicit key mapping in the future)
   * if no column-name is given, then the Hive column will map to all columns in the corresponding
HBase column family, and the Hive MAP datatype must be used to allow access to these (possibly
sparse) columns
   * there is currently no way to access the HBase timestamp attribute, and queries always
access data with the latest timestamp.
   * since HBase does not associate datatype information with columns, the serde converts
everything to string representation before storing it in HBase; there is currently no way
to plug in a custom serde per column
@@ -210, +210 @@

  correspond to the map values.
  
  {{{
- CREATE TABLE hbase_table_1(key int, value map<string,int>) 
+ CREATE TABLE hbase_table_1(value map<string,int>, row_key int) 
  STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
  WITH SERDEPROPERTIES (
- "hbase.columns.mapping" = ":key,cf:"
+ "hbase.columns.mapping" = "cf:,:key"
  );
- INSERT OVERWRITE TABLE hbase_table_1 SELECT foo, map(bar, foo) FROM pokes 
+ INSERT OVERWRITE TABLE hbase_table_1 SELECT map(bar, foo), foo FROM pokes 
  WHERE foo=98 OR foo=100;
  }}}
+ 
+ (This example also demonstrates using a Hive column other than the first as the HBase row
key.)
  
  Here's how this looks in HBase (with different column names in different rows):
  
@@ -237, +239 @@

  Launching Job 1 out of 1
  ...
  OK
- 100	{"val_100":100}
+ {"val_100":100}	100
- 98	{"val_98":98}
+ {"val_98":98}	98
  Time taken: 3.808 seconds
  }}}
  

Mime
View raw message