hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul sutter (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Wed, 10 May 2006 23:00:05 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378992 ] 

paul sutter commented on HADOOP-195:


A couple of small ideas:

- Could you fit the tempfiles in a RAM disk? This would just be a hack to determine whether
the disk physics of small files are a factor here, both on the mapper end and the reducer
end. Note that you need 2X the space on the reducer end, because it keeps two copies of the
data around in small-file-form. 

- If small files are shown to be a problem (as I am guessing), and (as Doug suggests) we want
to optimize for that case, perhaps the best thing to do would be to send the map output data
directly to the reducer, and have the reducers write them to disk in some log structured format,
maintaining a list of segments that were abandoned mid-stream and are to be ignored in the
processing step. This way you'd have all sequential disk access

Thanks. Sorry for the volume of responses here, but its an area of great interest to us.


> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: netstat.log, netstat.xls
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message