hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "NING DING (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6470) ApplicationMaster may fail to preempt Reduce task
Date Tue, 08 Sep 2015 02:03:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734129#comment-14734129
] 

NING DING commented on MAPREDUCE-6470:
--------------------------------------

[~kasha@cloudera.com] would you kindly help to take a look on this issue? ;)

> ApplicationMaster may fail to preempt Reduce task
> -------------------------------------------------
>
>                 Key: MAPREDUCE-6470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, resourcemanager, scheduler
>    Affects Versions: 2.7.1
>            Reporter: NING DING
>
> In my hadoop cluster the nodemanagers have different resource capacity. 
> Recently, when the yarn cluster ran out of resources and there are some big jobs running,
the AM cannot preempt reduce task. 
> The scenario could be simplified as below:
> Say, there are 5 nodemanagers in my hadoop cluster with FairScheduler strategy enabled.
> NodeManager Capacity :
> namenode1 <1024 memory, 1 cpu-vcores>
> namenode1 <4096 memory, 1 cpu-vcores>
> namenode1 <4096 memory, 1 cpu-vcores>
> namenode1 <1024 memory, 4 cpu-vcores>
> namenode1 <1024 memory, 4 cpu-vcores>
> Start one job including 10 maps and 10 reduces with following conf :
> yarn.app.mapreduce.am.resource.mb=1024m
> yarn.app.mapreduce.am.resource.cpu-vcores=1
> mapreduce.map.memory.mb=1024m
> mapreduce.reduce.memory.mb=1024m
> mapreduce.map.cpu.vcores=1
> mapreduce.reduce.cpu.vcores=1
> After some map tasks finished, 4 reduce tasks started, but there are still some map tasks
in scheduledRequests.
> At this time, the 5 nodemanagers resource usage is blow.
> NodeManager, Memory Used, Vcores Used, Memory Avail, Vcore Abail
> namenode1,     1024m,          1,          0,           0
> namenode2,     1024m,          1,          3072m,       0
> namenode3,     1024m,          1,          3072m,       0
> namenode4,     1024m,          1,          0,           3
> namenode5,     1024m,          1,          0,           3
> So AM try to start the rest map tasks.
> In RMContainerAllocator the availableResources got from ApplicationMasterService is <6144m,
6 cpu-vcores>.
> Then RMContainerAllocator thinks there is enough resource to start one map task, so it
will not try to preempt the reduce task. But in fact there isn't any single nodemanager has
enough resource available to run one map task. In this case, AM will fail to obtain the container
to start the rest map tasks. And since reduce tasks will not be preempted, the resource will
never been released, then the job hangs forever.
> I think the problem is that the overall resource headroom is not enough to help AM made
the right decision on whether to preempt the reduce task or not. We need to provide more information
to AM, e.g. adds a new api in AllocateResponse to get available resource list on all nodemanagers.
But this approaching might cost too much overhead. 
> Any ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message