hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/HBaseBulkLoad" by JohnSichi
Date Fri, 04 Feb 2011 20:49:14 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/HBaseBulkLoad" page has been changed by JohnSichi.


  In order to perform a parallel sort on the data, we need to range-partition it.  The idea
is to divide the space of row keys up into nearly equal-sized ranges, one per reducer which
will be used in the parallel sort.  The details will vary according to your source data, and
you may need to run a number of exploratory Hive queries in order to come up with a good enough
set of ranges.  Here's one example:
+ add jar lib/hive_contrib.jar;
  set mapred.reduce.tasks=1;
  create temporary function row_sequence as 

View raw message