hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan M. Kupferman" <jkupfer...@umail.ucsb.edu>
Subject Splitting within regions
Date Thu, 08 May 2008 20:32:12 GMT
Hi Everyone,
I am currently attempting to run a Map Reduce job where the input
comed from HBase. The input table has 22 regions, and thus creates 22
map tasks. This however creates an issue since so few map tasks
results in a poor distribution of labor on a cluster of 10+ machines,
specifically since the amount of work required is highly variable
depending on the region.

I would like to increase the number of map tasks at least 2 fold,
the relevant code seems to be in TableInputFormat.

//Original code
     Text[] startKeys = m_table.getStartKeys();
      if(startKeys == null || startKeys.length == 0) {
        throw new IOException("Expecting at least one region");
      }
      InputSplit[] splits = new InputSplit[startKeys.length];
      for(int i = 0; i < startKeys.length; i++) {
        splits[i] = new TableSplit(m_tableName, startKeys[i],
            ((i + 1) < startKeys.length) ? startKeys[i + 1] : new Text());
      }
//end-original

//Modified code
     Text[] startKeys = m_table.getStartKeys();
      if(startKeys == null || startKeys.length == 0) {
        throw new IOException("Expecting at least one region");
      }
      InputSplit[] splits = new InputSplit[startKeys.length*2];
      for(int i = 0; i < startKeys.length; i++) {
       Text halfsplit = new Text(""+Integer.parseInt(startKeys[i +  
1].toString())/2);
        splits[i] = new TableSplit(m_tableName, startKeys[i], halfsplit);
        splits[i+1] = new TableSplit(m_tableName, halfsplit ,((i + 1)  
< startKeys.length) ? startKeys[i + 2] : new Text());
      }
//end-modified

Is seems like the required modifications would be something along the  
lines the code written above. Is this the correct/best way to go about  
this?


Thanks,
Jonathan


Mime
View raw message