hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6678) Allow ShuffleHandler readahead without drop-behind
Date Tue, 10 May 2016 16:52:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278438#comment-15278438

Hudson commented on MAPREDUCE-6678:

FAILURE: Integrated in Hadoop-trunk-Commit #9739 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9739/])
MAPREDUCE-6678. Allow ShuffleHandler readahead without drop-behind. (epayne: rev cd35b692de88e3afe7f41405da635c3fbd9b4650)
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java

> Allow ShuffleHandler readahead without drop-behind
> --------------------------------------------------
>                 Key: MAPREDUCE-6678
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6678
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>             Fix For: 2.8.0
>         Attachments: YARN-4964.001.patch
> Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead (POSIX_FADV_WILLNEED)
and drop-behind (POSIX_FADV_DONTNEED) logic within the ShuffleHandler.
> It would be beneficial if these were separately configurable. 
> - Running without readahead can lead to significant seek storms caused by large numbers
of sendfiles() competing with one another.
> - However, running with drop-behind can also lead to seek storms because there are cases
where the server can successfully write the shuffle bytes to the network, BUT the client doesn't
want the bytes right now (MergeManager wants to WAIT is an example) so it ignores them and
asks for them again a bit later. This causes repeated reads of the same data from disk.
> I'll attach a simple patch that enables/disables readahead based on mapreduce.shuffle.readahead.bytes==0,
leaving mapreduce.shuffle.manage.os.cache controlling only the drop-behind.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message