hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAPREDUCE-7022) Fast fail rogue jobs based on task scratch dir size
Date Fri, 26 Jan 2018 20:41:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341561#comment-16341561

Jason Lowe edited comment on MAPREDUCE-7022 at 1/26/18 8:40 PM:

Thanks for updating the patch!  The unit test failure is a known issue tracked by MAPREDUCE-7020
,and the ASF license issue is unrelated.

+1 for the latest patch.  Committing this.

was (Author: jlowe):
Thanks for updating the patch!

+1 for the latest patch.  Committing this.

> Fast fail rogue jobs based on task scratch dir size
> ---------------------------------------------------
>                 Key: MAPREDUCE-7022
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7022
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 2.7.0, 2.8.0, 2.9.0
>            Reporter: Johan Gustavsson
>            Assignee: Johan Gustavsson
>            Priority: Major
>         Attachments: MAPREDUCE-7022.001.patch, MAPREDUCE-7022.002.patch, MAPREDUCE-7022.003.patch,
MAPREDUCE-7022.004.patch, MAPREDUCE-7022.005.patch, MAPREDUCE-7022.006.patch, MAPREDUCE-7022.007.patch,
MAPREDUCE-7022.008.patch, MAPREDUCE-7022.009.patch
> With the introduction of MAPREDUCE-6489 there are some options to kill rogue tasks based
on writes to local disk writes. In our environment are we mainly run Hive based jobs we noticed
that this counter and the size of the local scratch dirs were very different. We had tasks
where BYTES_WRITTEN counter were at 300Gb and where it was at 10Tb both producing around 200Gb
on local disk, so it didn't help us much. So to extend this feature tasks should monitor local
scratchdir size and fail if they pass the limit. In these cases the tasks should not be retried
either but instead the job should fast fail.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message