hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-979) speculative task failure can kill jobs
Date Mon, 05 Feb 2007 21:51:05 GMT
speculative task failure can kill jobs
--------------------------------------

                 Key: HADOOP-979
                 URL: https://issues.apache.org/jira/browse/HADOOP-979
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.11.0
            Reporter: Owen O'Malley
             Fix For: 0.12.0


We had a case where the random writer example was killed by speculative execution. It happened
like:

task_0001_m_000123_0 -> starts
task_0001_m_000123_1 -> starts and fails because attempt 0 is creating the file
task_0001_m_000123_2 -> starts and fails because attempt 0 is creating the file
task_0001_m_000123_3 -> starts and fails because attempt 0 is creating the file
task_0001_m_000123_4 -> starts and fails because attempt 0 is creating the file

job_0001 is killed because map_000123 failed 4 times. From this experience, I think we should
change the scheduling so that:

  1. Tasks are only allowed 1 speculative attempt.
  2. TIPs don't kill jobs until they have 4 failures AND the last task under that tip fails.

Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message