hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul sutter (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Tue, 09 May 2006 19:25:05 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378703 ] 

paul sutter commented on HADOOP-195:


Are you still using 64,000 mappers? If so, wouldnt your average map output file size be around

I'd suggest doing a /= 10 or /=50 on those mappers. 

If you had 1880 mappers and 376 reducers, your map output files would be 2.8MB each, which
might be better then 80KB.

You might try 1800 mappers and 350 reducers, so that you have spare capacity on your nodes
for failed mappers or reducers (giving you 3MB map output files).

Has anyone measured the map-task creation overhead? Does anyone know the file creation/deletion
overhead on Linux? Each of those little files is created, written, read, and deleted twice
in the currnet code, and each time as that tiny filesize).



> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log, netstat.xls
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message