hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Wed, 17 May 2006 05:06:07 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12412092 ] 

Owen O'Malley commented on HADOOP-195:
--------------------------------------

I wrote a network bandwidth tester that just uses Java sockets to connect all nodes to all
nodes. My application waits until all of the servers are up and starts sending data (10g/node)
using Java's sockets. On my cluster, which is currently at 202 nodes, it took an average of
1423 seconds (mean of 1630) to finish the transfer between nodes. That is substantially faster
than Hadoop's shuffle (7 hours?) and means that we have a long way to go in terms of shuffle
optimization.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: MapFileSimulator.java, data-transfer-chart.pdf, mapfilesimulator-big.txt,
mapfilesimulator-sort2.txt, netstat.log, netstat.xls
>
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message