hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Sun, 07 May 2006 19:10:21 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378311 ] 

eric baldeschwieler commented on HADOOP-195:
--------------------------------------------

Good list paul!

Some very simple changes should have a big impact on sort behavior as you observe. We'll start
working on that once it becomes the bottleneck.

One simple way to increase the file sizes is to reduce the number of reduces significantly
and increase the DFS block size to 64 or 128meg.

We'll play with these (if I can convince owen).  I think we should bump the hadoop default
block size to 128m, this is still small enough to replicate quickly, but will reduce #map
jobs significantly when you just want to scan data.  Reduce the number of reduces as well
and we'll have significantly larger transactions.

All that said, I think we are probably uncovering things in the RPC layer (and server threading)
more than basic network issues, since we're running on a decent network and not even beginning
to approach saturating it.  But we'll certainly play with "setTcpNoDelay."  It will be interesting
to see if that moves things along.

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

>
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message