hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MAPREDUCE-135) speculative task failure can kill jobs
Date Thu, 17 Jul 2014 16:47:05 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer resolved MAPREDUCE-135.
----------------------------------------

    Resolution: Incomplete

I'm going to close this out as stale.  I suspect this is no longer an issue.

> speculative task failure can kill jobs
> --------------------------------------
>
>                 Key: MAPREDUCE-135
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-135
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>
> We had a case where the random writer example was killed by speculative execution. It
happened like:
> task_0001_m_000123_0 -> starts
> task_0001_m_000123_1 -> starts and fails because attempt 0 is creating the file
> task_0001_m_000123_2 -> starts and fails because attempt 0 is creating the file
> task_0001_m_000123_3 -> starts and fails because attempt 0 is creating the file
> task_0001_m_000123_4 -> starts and fails because attempt 0 is creating the file
> job_0001 is killed because map_000123 failed 4 times. From this experience, I think we
should change the scheduling so that:
>   1. Tasks are only allowed 1 speculative attempt.
>   2. TIPs don't kill jobs until they have 4 failures AND the last task under that tip
fails.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message