hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhallamudi Venkata Siva Kamesh (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4013) Reduce task gets stuck when a M/R job is configured to tolerate failures
Date Mon, 19 Mar 2012 10:49:38 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Bhallamudi Venkata Siva Kamesh updated MAPREDUCE-4013:

    Attachment: MAPREDUCE-4013.patch

Shuffle phase hangs,as long as *remainingMaps > 0*. We are decrementing the count of *remainingMaps*
only when the copy phase is sucessful.But for the failedmaps, we are not decrementing *remainingMaps*
count. We should decrement it. Attaching the patch for the same.

Tested the patch by setting *mapreduce.map.failures.maxpercent* value as 2. Ran a job having
104 map tasks and failed 2 map tasks. Job is till passing.

Please look into the patch and provide your comments.
> Reduce task gets stuck when a M/R job is configured to tolerate failures
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-4013
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4013
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Amar Kamat
>            Priority: Blocker
>              Labels: shuffle
>             Fix For: 0.24.0
>         Attachments: MAPREDUCE-4013.patch
> When a M/R job is configured to run with some tolerance to task failures (via mapreduce.map.failures.maxpercent),
then the reduce task of that job gets stuck in the shuffle phase. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message