hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhallamudi Venkata Siva Kamesh (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4013) Reduce task gets stuck when a M/R job is configured to tolerate failures
Date Thu, 15 Mar 2012 14:37:38 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230200#comment-13230200

Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-4013:

I *think*, the following could be the reason

We are initializing remainingMaps as totalMaps. But if we configure *mapreduce.map.failures.maxpercent*
as some non zero value, job will proceed to run even some maps fail (configured %). However,
decrementing count of remainingMaps, only when the map output copy is sucessful. But even
if a single map fails, it will not be copied and so remainingMaps will be non zero always.

 if (--remainingMaps == 0) {
> Reduce task gets stuck when a M/R job is configured to tolerate failures
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-4013
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4013
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Amar Kamat
>            Priority: Blocker
>              Labels: shuffle
>             Fix For: 0.24.0
> When a M/R job is configured to run with some tolerance to task failures (via mapreduce.map.failures.maxpercent),
then the reduce task of that job gets stuck in the shuffle phase. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message