hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (MAPREDUCE-4992) AM hangs in RecoveryService when recovering tasks with speculative attempts
Date Thu, 14 Mar 2013 16:08:13 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Lowe reopened MAPREDUCE-4992:
-----------------------------------


This is still occurring in a number of ways:

* If the task attempt that succeeded was attempt 1 but there is no completion event in the
history file for attempt 0, it recovers only attempt 0 but is waiting for attempt 1 to complete.
* If two task attempts succeed simultaneously it only recovers attempt 0 but is waiting for
attempt 1 to complete.
* If the prior AM attempt was backed up in event processing and launched speculative task
attempts *after* a task attempt completed then it ends up waiting on them but they were never
launched.
                
> AM hangs in RecoveryService when recovering tasks with speculative attempts
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4992
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4992
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: trunk, 2.0.2-alpha, 0.23.6
>            Reporter: Robert Parker
>            Assignee: Robert Parker
>            Priority: Critical
>             Fix For: 0.23.7, 2.0.5-beta
>
>         Attachments: MAPREDUCE-4992v1.patch, MAPREDUCE-4992v2.patch
>
>
> A job hung in the Recovery Service on an AM restart. There were four map tasks events
that were not processed and that prevented the complete task count from reaching zero which
exits the recovery service. All four tasks were speculative

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message