hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/HBaseBulkLoad" by JohnSichi
Date Fri, 16 Apr 2010 23:32:46 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/HBaseBulkLoad" page has been changed by JohnSichi.


  The second command populates it (using the sampling query previously defined).  Usage of
ORDER BY guarantees that a single file will be produced in directory {{{/tmp/hb_range_keys}}}.
 The filename is unknown, but it is necessary to reference the file by name later, so run
a command such as the following to copy it to a specific name:
- dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list
+ dfs -cp /tmp/hb_range_keys/* /tmp/hb_range_key_list;
  = Prepare Staging Location =
  The sort is going to produce a lot of data, so make sure you have sufficient space in your
HDFS cluster, and choose the location where the files will be staged.  We'll use {{{/tmp/hbsort}}}
in this example.
+ The directory does not actually need to exist (it will be automatically created in the next
step), but if it does exist, it should be empty.
+ {{{
+ dfs -rmr /tmp/hbsort;
+ dfs -mkdir /tmp/hbsort;
+ }}}
  = Sort Data =

View raw message