hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Tue, 09 May 2006 19:52:05 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378715 ] 

Doug Cutting commented on HADOOP-195:

> Are you still using 64,000 mappers? If so, wouldnt your average map output file size
be around 80KB?

Yes.  But this comes from having a map task per input file block, which permits map tasks
to be placed on nodes where their data is local.  Simply reducing the number of map tasks
will defeat that important optimization.  Better to instead increase the dfs block size to
128m, as Eric suggested.  This would increase the map outputs to 340k (with 376 reducers).
 But, once we move to a larger cluster, with 1000 or more reducers, then the map outputs will
again become small.  So optimizing for small map outputs will remain important, even as we
increase the dfs block size.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log, netstat.xls
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message