hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@yahoo.com>
Subject Re: Splitting within regions
Date Thu, 08 May 2008 20:43:10 GMT
I have also been considering this issue but from the opposite direction
-- forcing splits of the table. From the perspective of I/O and loading
optimization, doesn't it make the most sense to have a 1:1 mapping of
regions to tasks? 

This issue I think will come up now and again if a user has tables that
hold a large number of items yet those items have small keys and column
data. 

Of course there are some problems with this, first and foremost the
problem that too many regions for the carrying capacity of the cluster
will take down all of the region servers via OOME in a cascading spiral
of death. Then there is the issue of making sure the key space of a
forced split is not too small as to underutilize the mapfile storage
available. Then there is the issue of key distributions being dependent
on the particular dataset and schema, so tweaking the global setting
hbase.hregion.max.filesize might not be a good idea. 

Just some random thoughts on the topic,

   - Andrew Purtell

--- "Jonathan M. Kupferman" <jkupferman@umail.ucsb.edu> wrote:

> Hi Everyone,
> I am currently attempting to run a Map Reduce job where the input
> comed from HBase. The input table has 22 regions, and thus creates 22
> map tasks. This however creates an issue since so few map tasks
> results in a poor distribution of labor on a cluster of 10+ machines,
> specifically since the amount of work required is highly variable
> depending on the region.
> 
> I would like to increase the number of map tasks at least 2 fold,
> the relevant code seems to be in TableInputFormat.
> 
> //Original code
>      Text[] startKeys = m_table.getStartKeys();
>       if(startKeys == null || startKeys.length == 0) {
>         throw new IOException("Expecting at least one region");
>       }
>       InputSplit[] splits = new InputSplit[startKeys.length];
>       for(int i = 0; i < startKeys.length; i++) {
>         splits[i] = new TableSplit(m_tableName, startKeys[i],
>             ((i + 1) < startKeys.length) ? startKeys[i + 1] : new
> Text());
>       }
> //end-original
> 
> //Modified code
>      Text[] startKeys = m_table.getStartKeys();
>       if(startKeys == null || startKeys.length == 0) {
>         throw new IOException("Expecting at least one region");
>       }
>       InputSplit[] splits = new InputSplit[startKeys.length*2];
>       for(int i = 0; i < startKeys.length; i++) {
>        Text halfsplit = new Text(""+Integer.parseInt(startKeys[i +  
> 1].toString())/2);
>         splits[i] = new TableSplit(m_tableName, startKeys[i],
> halfsplit);
>         splits[i+1] = new TableSplit(m_tableName, halfsplit ,((i + 1)  
> < startKeys.length) ? startKeys[i + 2] : new Text());
>       }
> //end-modified
> 
> Is seems like the required modifications would be something along the  
> lines the code written above. Is this the correct/best way to go about 
> 
> this?
> 
> 
> Thanks,
> Jonathan
> 
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Mime
View raw message