hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-675) Report correct server hosting a table split for assignment to for MR Jobs
Date Tue, 15 Jul 2008 14:03:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613609#action_12613609
] 

Billy Pearson commented on HBASE-675:
-------------------------------------

sense the load on MR jobs is on the Region server not the datanode then moveing the task close
to the region server would be idea. 

the problem with above is the client/mr task would have to talk to the region server over
the network for a request 
then the region server would have to talk back to the data node with the data
and then back to the map task with the returned data.
thats 4 hops for each request.

If the task would run on the same region server as the region was hosted then we would 
just have the region server to the data node hops speeding up each request

Assuming average setup is going to be per server
hadoop datanode
hbase region server
X mapper tasks
x Reducer tasks

Then havening the local task work on the local region server would also help spread the load
and help not overloading one region server at once.


> Report correct server hosting a table split for assignment to for MR Jobs
> -------------------------------------------------------------------------
>
>                 Key: HBASE-675
>                 URL: https://issues.apache.org/jira/browse/HBASE-675
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Billy Pearson
>            Priority: Minor
>             Fix For: 0.3.0
>
>
> Currently we return a null String array to the MR framework to use a random node for
MR job assignment.
> class: org.apache.hadoop.hbase.mapred.tableSplit
> function getLocations()
> We should be able to query the meta now for the current host name of the server hosting
the region in question.
> This will help with scaling as there will be less cross server communication removing
bandwidth as a bottleneck.
> The side effect of fixing this will help from overloading region servers with lots of
MR clients all pulling from the same region server while theres work local for them to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message