hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6485) MR job hanged forever because all resources are taken up by reducers and the last map attempt never get resource to run
Date Fri, 02 Oct 2015 18:12:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941511#comment-14941511
] 

Hudson commented on MAPREDUCE-6485:
-----------------------------------

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2414 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2414/])
MAPREDUCE-6485. Create a new task attempt with failed map task priority (rohithsharmaks: rev
439f43ad3defbac907eda2d139a793f153544430)
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestTaskImpl.java


> MR job hanged forever because all resources are taken up by reducers and the last map
attempt never get resource to run
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6485
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6485
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 3.0.0, 2.4.1, 2.6.0, 2.7.1
>            Reporter: Bob
>            Assignee: Xianyin Xin
>            Priority: Critical
>             Fix For: 2.8.0
>
>         Attachments: MAPREDUCE-6485.001.patch, MAPREDUCE-6485.004.patch, MAPREDUCE-6485.005.patch,
MAPREDUCE-6485.006.patch, MAPREDUCE-6845.002.patch, MAPREDUCE-6845.003.patch
>
>
> The scenarios is like this:
> With configuring mapreduce.job.reduce.slowstart.completedmaps=0.8, reduces will take
resource and  start to run when all the map have not finished. 
> But It could happened that when all the resources are taken up by running reduces, there
is still one map not finished. 
> Under this condition , the last map have two task attempts .
> As for the first attempt was killed due to timeout(mapreduce.task.timeout), and its state
transitioned from RUNNING to FAIL_CONTAINER_CLEANUP then to FAILED, but failed map attempt
would not be restarted for there is still one speculate map attempt in progressing. 
> As for the second attempt which was started due to having enable map task speculative
is pending at UNASSINGED state because of no resource available. But the second map attempt
request have lower priority than reduces, so preemption would not happened.
> As a result all reduces would not finished because of there is one map left. and the
last map hanged there because of no resource available. so, the job would never finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message