hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-987) We need a Hbase Partitioner for TableMapReduceUtil.initTableReduceJob MR Jobs
Date Mon, 10 Nov 2008 07:58:44 GMT

     [ https://issues.apache.org/jira/browse/HBASE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Billy Pearson updated HBASE-987:

    Attachment: 987.patch.txt

The only thing I have not addressed in this patch is if someone sets 
the number of reducer higher then then regions a table has then the > then region count

task will not have any work.. Somewhere in the process reduce the reducers count to 
number of regions like we do in TableMap 

But do not know where you guys would like me to do that maybe can do it in 
the TableMapReduceUtil.initTableReduceJob any other ideas?

Need someone to review with a larger number of server then I have.

> We need a Hbase Partitioner for TableMapReduceUtil.initTableReduceJob MR Jobs
> -----------------------------------------------------------------------------
>                 Key: HBASE-987
>                 URL: https://issues.apache.org/jira/browse/HBASE-987
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Billy Pearson
>            Assignee: Billy Pearson
>            Priority: Minor
>             Fix For: 0.19.0
>         Attachments: 987.patch.txt
> When we run say 20 reducers they all get ~1/20th of the data to output to the table.
> The problem for us on large import jobs is the data gets sorted by key and the all Reducers

> pound one region at a time.
> we need to add onto the TableMapReduceUtil.initTableReduceJob method so it can set the
> and set the number of reducers = number of regions as the table map does for maps.
> Then the Partitioner will send all the BatchUpdates for one region to one reducer.
> So we get a more even spread of writers to the regions this would assure that only one
reducer will send 
> updates to one region keeping any one region from getting more overloaded the others.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message