hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5817) Mappers get rescheduled on node transition even after all reducers are completed
Date Sat, 15 Aug 2015 14:40:48 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698294#comment-14698294
] 

Hudson commented on MAPREDUCE-5817:
-----------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #277 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/277/])
MAPREDUCE-5817. Mappers get rescheduled on node transition even after all reducers are completed.
(Sangjin Lee via kasha) (kasha: rev 27d24f96ab8d17e839a1ef0d7076efc78d28724a)
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java
* hadoop-mapreduce-project/CHANGES.txt


> Mappers get rescheduled on node transition even after all reducers are completed
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5817
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.3.0
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>             Fix For: 2.8.0
>
>         Attachments: MAPREDUCE-5817.001.patch, MAPREDUCE-5817.002.patch, mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already finished.
We found that the job was rescheduling and running a number of mappers beyond the point of
reducer completion. In one situation, the job ran for some 9 more hours after all reducers
completed!
> This happens because whenever a node transition (to an unusable state) comes into the
app master, it just reschedules all mappers that already ran on the node in all cases.
> Therefore, if any node transition has a potential to extend the job period. Once this
window opens, another node transition can prolong it, and this can happen indefinitely in
theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, then any big
job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule
mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and
there is no need to reschedule mapper tasks as they would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message