hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sameer Paranjpye (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Wed, 24 May 2006 17:48:30 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12413154 ] 

Sameer Paranjpye commented on HADOOP-195:
-----------------------------------------

Some more implementation notes:

1) The ReduceTaskRunner queries for map output locations concurrently with the copies. Discovered
outputs
are pushed into a queue from which they are pulled by the copiers.
2) Added a new RPC method 'callRaw' which bypasses RPC's connection sharing mechanism. Requests
are sent down a socket owned exclusively by the caller. The caller is free to do what he pleases
with the connection once the request is complete. This implementation closes the connection
immediately so that we don't get lots of idle threads in the tasktrackers.
3) Early runs showed some load imbalances in the copies with lots of reduces pounding a single
tasktracker. I've tried to address this by
  a) ensuring that a reduce only copies 1 output from a given host at any time 
  b) introducing a backoff if a copy from some host fails
this seems to works fine for single node and small clusters because the number outputs is
also small enough that a lot of parallelism isn't needed
4) Some tasktrackers were running out of memory when 100s of clients connected to them at
once. This happened because not enough memory was available to create the threads needed to
handle the concurrent connections. This caused tasktrackers to go into a bad state and not
service any more client requests while still sending heartbeats to the jobtracker. Tasktrackers
now handle OutOfMemory errors gracefully. Of course, this condition no longer manifested once
the load balancing code mentioned in 3) was introduced.







> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: MapFileSimulator.java, data-transfer-chart.pdf, mapfilesimulator-big.txt,
mapfilesimulator-sort2.txt, netstat.log, netstat.xls, parallel-copiers.txt
>
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message