hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@yahoo.com>
Subject Re: Using Hbase as data sink
Date Tue, 23 Dec 2008 20:15:39 GMT
> From: stack <stack@duboce.net>
> Subject: Re: Using Hbase as data sink
> To: hbase-user@hadoop.apache.org
> Date: Tuesday, December 23, 2008, 8:05 AM
> Jim Twensky wrote:
> > ...
> > Why do we need to set the number of the reduce tasks
> > according to the number of regions? Would it make a
> > performance difference?
> >   
> 
> Regions are the 'natural' division in hbase.  My
> guess is that the partitioner was an attempt at calculating
> an N for reducers that was other than 1 or just some
> hard-coding.
 
I use the log of the regions as the number of reduce tasks to
run, and the default partitioner which just distributes the 
load evenly among the reducers using a hash of the key. More
precisely:

  HTable table = new HTable(conf, tableName);
  int nrReducers =
   (int)Math.ceil(
     Math.log1p((double)table.getStartKeys().length));
  // ...
  job.setNumReduceTasks(nrReducers);

I have no formal reason for this. The method just has a nice
feel to it.

   - Andy



      

Mime
View raw message