hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhallamudi Venkata Siva Kamesh (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4013) Reduce task gets stuck when a M/R job is configured to tolerate failures
Date Tue, 27 Mar 2012 06:53:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239260#comment-13239260

Bhallamudi Venkata Siva Kamesh commented on MAPREDUCE-4013:

Thanks Ravi for taking a look into the patch

bq.What about the "progress of map tasks" when there are failed-maps ? Is it getting updated
to 100% ? I see copySucceded() is updating the progress of map-tasks. So what happens when
the last few maps fail ?

Suppose say a user has configured *mapreduce.map.failures.maxpercent* as 2, so job can tolerate
upto  2% of map tasks failures. 
As "progress of map tasks" indicates percentage of the sucessful completion of map tasks,
I *think* showing the actual *progress* may be more useful than showing 100%. 
i.e. if "progress of map tasks" indicates 99%, by this, atleast it gives an idea that 1% of
map tasks have been failed and consequently may take action on that failed map tasks.

OTOH, if "progress of map tasks" should indicate the overall progress of the map phase, then
patch needs to be updated to reflect the same.

As this has been duplicated, we can have our further discussion at [MAPREDUCE-3927|https://issues.apache.org/jira/browse/MAPREDUCE-3927]
> Reduce task gets stuck when a M/R job is configured to tolerate failures
> ------------------------------------------------------------------------
>                 Key: MAPREDUCE-4013
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4013
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Amar Kamat
>            Priority: Blocker
>              Labels: shuffle
>             Fix For: 0.24.0
>         Attachments: MAPREDUCE-4013.patch
> When a M/R job is configured to run with some tolerance to task failures (via mapreduce.map.failures.maxpercent),
then the reduce task of that job gets stuck in the shuffle phase. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message