hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6489) Fail fast rogue tasks that write too much to local disk
Date Tue, 20 Oct 2015 22:36:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965902#comment-14965902

Jason Lowe commented on MAPREDUCE-6489:

+1 latest patch looks good to me.  However the patch does not apply cleanly to branch-2. 
[~maysamyabandeh] could you provide a branch-2 patch as well?

> Fail fast rogue tasks that write too much to local disk
> -------------------------------------------------------
>                 Key: MAPREDUCE-6489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.1
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: MAPREDUCE-6489.001.patch, MAPREDUCE-6489.002.patch, MAPREDUCE-6489.003.patch
> Tasks of the rogue jobs can write too much to local disk, negatively affecting the jobs
running in collocated containers. Ideally YARN will be able to limit amount of local disk
used by each task: YARN-4011. Until then, the mapreduce task can fail fast if the task is
writing too much (above a configured threshold) to local disk.
> As we discussed [here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750]
the suggested approach is that the MapReduce task checks for BYTES_WRITTEN counter for the
local disk and throws an exception when it goes beyond a configured value.  It is true that
written bytes is larger than the actual used disk space, but to detect a rogue task the exact
value is not required and a very large value for written bytes to local disk is a good indicative
that the task is misbehaving.

This message was sent by Atlassian JIRA

View raw message