hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Tue, 09 May 2006 16:54:05 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378655 ] 

Owen O'Malley commented on HADOOP-195:

Since there is obviously interest in my benchmark, here is an update:

I reran my sort test yesterday with:
   1. fewer reduces (2/node) (hadoop-202)
   2. the map ids replaced with integers (hadoop-200)
   3. the number of server threads for map output serving set to 20

I sorted 1760 gig of data on 179 nodes in 18.6 hours, which is much better than before.

I had 20 reduce tasks fail and reexecute themselves (last original reduce finished in ~16.5

2 of those tasks were assigned to the same node and were the only two tasks running for the
last hour, which clearly shows that we need speculative reduces.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log, netstat.xls
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message