hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul sutter (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Sun, 07 May 2006 01:15:21 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378265 ] 

paul sutter commented on HADOOP-195:

I checked with Robert Ramey who held the PennySort record for a while, because the PennySort
runs on a single commodity meachine, so we should all feel comfortable with the comparison.

His sort does about 1GB/minute on an ordinary machine.

So,  1880GB on 188 machines should be = 10 minutes sort time + ?? copy time (copy should be
faster than sort).

If Owen's example is taking 1 hour sort time plus 7 hours copy time, its apparent that the
biggest win is in improving the copy, and the next important task is improving the sort.

David Cossock's sort at Yahoo is very fast, so that is one option if Yahoo is interested.

If  Yahoo isnt interested in contributing that, Robert Ramey may be open to doing consulting
work in this area. Once the copy times are improved, we'd be interested in helping sponsor
sort performance improvements.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message