hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HBASE-675) Report correct server hosting a table split for assignment to for MR Jobs
Date Tue, 18 Nov 2008 07:03:44 GMT

     [ https://issues.apache.org/jira/browse/HBASE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack resolved HBASE-675.

    Resolution: Fixed

Committed j-d's patch with minor modifications from Arthur's patch.  Biggest change in Arthur's
patch is a split mechanism that groups together regions that are on the one regionservers;
thats kinda nice and others will want to do that but the map per region strikes me as more
general and therefore should be the default.

To test, filled table then ran the rowcounter; it uses TableInputFormat.  I saw in ganglia
that we did about half the net traffic on a small 4-node cluster.  You can also see in the
TaskTracker map UI how the map is running on same node as regionserver carrying the region
being processed.

> Report correct server hosting a table split for assignment to for MR Jobs
> -------------------------------------------------------------------------
>                 Key: HBASE-675
>                 URL: https://issues.apache.org/jira/browse/HBASE-675
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>    Affects Versions: 0.2.0
>            Reporter: Billy Pearson
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.19.0
>         Attachments: 675-v2.patch, arthur.patch, hbase-675-v1.patch, network.png
> Currently we return a null String array to the MR framework to use a random node for
MR job assignment.
> class: org.apache.hadoop.hbase.mapred.tableSplit
> function getLocations()
> We should be able to query the meta now for the current host name of the server hosting
the region in question.
> This will help with scaling as there will be less cross server communication removing
bandwidth as a bottleneck.
> The side effect of fixing this will help from overloading region servers with lots of
MR clients all pulling from the same region server while theres work local for them to do.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message