hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5791) Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not read disks efficiently
Date Fri, 14 Mar 2014 00:15:46 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934339#comment-13934339

Chris Nauroth commented on MAPREDUCE-5791:

A few comments:

# {{ShuffleHandler}}
## {{SHUFFLE_BUFFER_SIZE}} is potentially confusing, because it's reusing {{io.file.buffer.size}},
which is already used elsewhere, and with a different default value.  I recommend making this
an MR-specific property, like {{mapreduce.shuffle.transfer.buffer.size}}, and documenting
it in mapred-default.xml.
# {{TestFadvisedFileRegion}}
## Minor nit: we put the opening { inline, not on a separate line.
## {{out}}, {{inputFile}}, {{targetFile}}, {{target}} and {{in}} all should be closed inside
a finally block to guarantee cleanup.  The {{IOUtils#cleanup}} method is helpful for this.
 For {{fileRegion}}, it looks like we ought to call {{releaseExternalResources}}: http://grepcode.com/file/repository.jboss.org/nexus/content/repositories/releases/org.jboss.netty/netty/3.2.0.CR1/org/jboss/netty/channel/DefaultFileRegion.java.
## {{testCustomShuffleTransferCornerCases}}: This can be removed if you don't intend to add
test code in here.

> Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not read disks
> ------------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-5791
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5791
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Nikola Vujic
>            Assignee: Nikola Vujic
>         Attachments: MAPREDUCE-5791.patch
> transferTo method in org.apache.hadoop.mapred.FadvisedFileRegion is using transferTo
method from a FileChannel to transfer data from a disk to socket. This is performing slow
in Windows, slower than in Linux. The reason is that transferTo method for the java.nio is
issuing 32K IO requests all the time. In Windows, these 32K transfers are not optimal and
we don't get the best performance form the underlying IO subsystem. In order to achieve better
performance when reading from the drives, we need to read data in bigger chunks, 512K for

This message was sent by Atlassian JIRA

View raw message