hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/HBaseBulkLoad" by JohnSichi
Date Fri, 16 Apr 2010 23:56:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/HBaseBulkLoad" page has been changed by JohnSichi.
http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad?action=diff&rev1=11&rev2=12

--------------------------------------------------

  set hive.mapred.partitioner=org.apache.hadoop.mapred.lib.TotalOrderPartitioner;
  set total.order.partitioner.natural.order=false;
  set total.order.partitioner.path=/tmp/hb_range_key_list;
- set hfile.compression='gz';
+ set hfile.compression=gz;
  
  create table hbsort(transaction_id string, user_name string, amount double, ...)
  stored as
@@ -129, +129 @@

  cluster by transaction_id;
  }}}
  
- The CREATE TABLE creates a dummy table which controls how the output of the sort is written.
 Note that it uses {{{HiveHFileOutputFormat}}} to do this, with the table property {{{hfile.family.path}}}
used to control the destination directory for the output.  Again, be sure to set the inputformat/outputformat
exactly as specified.  In the example above, we select gzip ('gz') compression for the result
files; if you don't set the {{{hfile.compression}}} parameter, no compression will be performed.
 (The other method available is 'lzo', which compresses less aggressively but does not require
as much CPU power.)
+ The CREATE TABLE creates a dummy table which controls how the output of the sort is written.
 Note that it uses {{{HiveHFileOutputFormat}}} to do this, with the table property {{{hfile.family.path}}}
used to control the destination directory for the output.  Again, be sure to set the inputformat/outputformat
exactly as specified.  In the example above, we select gzip (gz) compression for the result
files; if you don't set the {{{hfile.compression}}} parameter, no compression will be performed.
 (The other method available is lzo, which compresses less aggressively but does not require
as much CPU power.)
  
  The {{{cf}}} in the path specifies the name of the column family which will be created in
HBase, so the directory name you choose here is important.  (Note that we're not actually
using an HBase table here; {{{HiveHFileOutputFormat}}} writes directly to files.)
  

Mime
View raw message