hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4775) Reducer will "never" commit suicide
Date Wed, 07 Nov 2012 16:13:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492458#comment-13492458
] 

Robert Joseph Evans commented on MAPREDUCE-4775:
------------------------------------------------

OK so I missed some of the code in shuffleScheduler.checkReducerHealth(). The stall check
is in there, but the previous check for a single map attempt is completely useless at this
point. Dropping the severity accordingly.
Robert Joseph Evans added a comment.  I am also confused why a reducer could be stalled for
over an hour (MAPREDUCE-4772) and not be killed. I will look into that here too.

                
> Reducer will "never" commit suicide
> -----------------------------------
>
>                 Key: MAPREDUCE-4775
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4775
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>
> In 1.0 there are a number of conditions that will cause a reducer to commit suicide and
exit.
> This includes if it is stalled, if the error percentage of total fetches is too high.
 In the new code it will only commit suicide when the total number of failures for a single
task attempt is >= max(30, totalMaps/10).  In the best case with the quadratic back-off
to get a single map attempt to reach 30 failure it would take 20.5 hours.  And unless there
is only one reducer running the map task would have been restarted before then.
> We should go back to include the same reducer suicide checks that are in 1.0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message