hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-141) Disk thrashing / task timeouts during map output copy phase
Date Sat, 14 Oct 2006 03:30:38 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-141?page=all ]

Owen O'Malley resolved HADOOP-141.

    Fix Version/s: 0.3.0
       Resolution: Fixed
         Assignee: Owen O'Malley

> Disk thrashing / task timeouts during map output copy phase
> -----------------------------------------------------------
>                 Key: HADOOP-141
>                 URL: http://issues.apache.org/jira/browse/HADOOP-141
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>         Environment: linux
>            Reporter: p sutter
>         Assigned To: Owen O'Malley
>             Fix For: 0.3.0
> MapOutputProtocol connections cause timeouts because of system thrashing and transferring
the same file over and over again, ultimately leading to making no forward progress(medium
sized job, 500GB input file, map output about as large as the input, 10 node cluster).
> There are several bugs behind this, but the following two changes improved matters considerably.
> (1) 
> The buffersize in MapOutputFile is currently hardcoded to 8192 bytes (for both reads
and writes). By changing this buffer size to 256KB, the number of disk seeks are reduced and
the problem went away. 
> Ideally there would be a buffer size parameter for this that is separate from the DFS
io buffer size.
> (2)
> I also added the following code to the socket configuration in both Server.java and Client.java.
No linger is a minor good idea in an enivronment with some packet loss (and you will have
that when all the nodes get busy at once), but 256KB buffers is probably excessive, especially
on a LAN, but it takes me two hours to test changes so I havent experimented.
> socket.setSendBufferSize(256*1024);
> socket.setReceiveBufferSize(256*1024);
> socket.setSoLinger(false, 0);
> socket.setKeepAlive(true);

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message