hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-195) transfer map output transfer with http instead of rpc
Date Fri, 05 May 2006 17:38:28 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-195?page=comments#action_12378080 ] 

Owen O'Malley commented on HADOOP-195:
--------------------------------------

Looking at the logs of my sort benchmark on 188 nodes, each reduce is fetching and processing
1 gig of data from ~64k maps. After all of the maps are done, a reduce takes 8 hours to run.
7 of those hours are in fetching the map outputs.

Timing of calls to getFile that complete (average ~15k bytes):
   Max: 76 seconds
   Avg: 385 ms
   Mean: 45 ms
   Distribution (count, int(log_2(ms))):
     149 0
   41449 1
   96305 2
  101595 3
13060775 4
71232197 5
 9675008 6
 4569403 7
 5688185 8
 5196811 9
 3878971 10
 4267733 11
 1855209 12
  411456 13
   70594 14
   24182 15
       1 16

Timeouts from getFile: 29120

So the reduce prepare is being dominated by calls to getFile (64k * 385 = 6.8 hours).

For a first pass, I'll try increasing the number of threads serving data to 20 (from 2) and
try the parallel rpc call to fetch 5 files at a time.

Thoughts?

> transfer map output transfer with http instead of rpc
> -----------------------------------------------------
>
>          Key: HADOOP-195
>          URL: http://issues.apache.org/jira/browse/HADOOP-195
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3

>
> The data transfer of the map output should be transfered via http instead rpc, because
rpc is very slow for this application and the timeout behavior is suboptimal. (server sends
data and client ignores it because it took more than 10 seconds to be received.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message