hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6923) Optimize MapReduce Shuffle I/O for small partitions
Date Wed, 09 Aug 2017 22:43:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated MAPREDUCE-6923:
------------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0-beta1
                   2.9.0
           Status: Resolved  (was: Patch Available)

Committed to branch-2 and trunk. Thanks a lot for your contribution Robert!

Good luck with your research. I hope to hear back from you when you publish. And look forward
to more valuable contributions from you. 

> Optimize MapReduce Shuffle I/O for small partitions
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-6923
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6923
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>         Environment: Observed in Hadoop 2.7.3 and above (judging from the source code
of future versions), and Ubuntu 16.04.
>            Reporter: Robert Schmidtke
>            Assignee: Robert Schmidtke
>             Fix For: 2.9.0, 3.0.0-beta1
>
>         Attachments: MAPREDUCE-6923.00.patch, MAPREDUCE-6923.01.patch
>
>
> When a job configuration results in small partitions read by each reducer from each mapper
(e.g. 65 kilobytes as in my setup: a [TeraSort|https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/terasort/TeraSort.java]
of 256 gigabytes using 2048 mappers and reducers each), and setting
> {code:xml}
> <property>
>   <name>mapreduce.shuffle.transferTo.allowed</name>
>   <value>false</value>
> </property>
> {code}
> then the default setting of
> {code:xml}
> <property>
>   <name>mapreduce.shuffle.transfer.buffer.size</name>
>   <value>131072</value>
> </property>
> {code}
> results in almost 100% overhead in reads during shuffle in YARN, because for each 65K
needed, 128K are read.
> I propose a fix in [FadvisedFileRegion.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java#L114]
as follows:
> {code:java}
> ByteBuffer byteBuffer = ByteBuffer.allocate(Math.min(this.shuffleBufferSize, trans >
Integer.MAX_VALUE ? Integer.MAX_VALUE : (int) trans));
> {code}
> e.g. [here|https://github.com/apache/hadoop/compare/branch-2.7.3...robert-schmidtke:adaptive-shuffle-buffer].
This sets the shuffle buffer size to the minimum value of the shuffle buffer size specified
in the configuration (128K by default), and the actual partition size (65K on average in my
setup). In my benchmarks this reduced the read overhead in YARN from about 100% (255 additional
gigabytes as described above) down to about 18% (an additional 45 gigabytes). The runtime
of the job remained the same in my setup.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message