hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith Sharma K S (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
Date Fri, 26 Jun 2015 17:56:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603308#comment-14603308
] 

Rohith Sharma K S commented on YARN-3849:
-----------------------------------------

The below is the log trace for the issue.

In our cluster, 
there are 3 NodeManager and each with resource {{<memory:327680, vCores:35>}}. Total
cluster resource is {{clusterResource: <memory:983040, vCores:105>}} with CapacityScheduler
configured queue's with name *default* and *QueueA*.


 # Application app-1 is submitted to queue default and containers are started running the
applications with 10 containers,each with {{resource: <memory:1024, vCores:10>}}. so
total used is {{usedResources=<memory:10240, vCores:91>}}
{noformat}
default user=spark used=<memory:10240, vCores:91> numContainers=10 headroom = <memory:1024,
vCores:10> user-resources=<memory:10240, vCores:91>
Re-sorting assigned queue: root.default stats: default: capacity=0.5, absoluteCapacity=0.5,
usedResources=<memory:10240, vCores:91>, usedCapacity=1.7333333, absoluteUsedCapacity=0.8666667,
numApps=1, numContainers=10
{noformat}
*NOTE : Resource allocation is by CPU DOMINANT*
After 10 container running, available NodeManagers memory is
{noformat}
linux-174, available: <memory:323584, vCores:4>
linux-175, available: <memory:324608, vCores:5>
linux-223, available: <memory:324608, vCores:5>
{noformat}
# Application app-2 is submitted to QueueA. ApplicationMaster container started running and
NodeManager memory is {{available: <memory:322560, vCores:3>}}
 {noformat}
Assigned container container_1435072598099_0002_01_000001 of capacity <memory:1024, vCores:1>
on host linux-174:26009, which has 5 containers, <memory:5120, vCores:32> used and <memory:322560,
vCores:3> available after allocation | SchedulerNode.java:154
linux-174, available: <memory:322560, vCores:3>
{noformat}
# the preemption policy does the below calculation
{noformat}
2015-06-23 23:20:51,127 NAME: QueueA CUR: <memory:0, vCores:0> PEN: <memory:0, vCores:0>
GAR: <memory:491520, vCores:52> NORM: NaN IDEAL_ASSIGNED: <memory:0, vCores:0>
IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0> UNTOUCHABLE:
<memory:0, vCores:0> PREEMPTABLE: <memory:0, vCores:0>
2015-06-23 23:20:51,128 NAME: default CUR: <memory:851968, vCores:91> PEN: <memory:0,
vCores:0> GAR: <memory:491520, vCores:52> NORM: 1.0 IDEAL_ASSIGNED: <memory:851968,
vCores:91> IDEAL_PREEMPT: <memory:0, vCores:0> ACTUAL_PREEMPT: <memory:0, vCores:0>
UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:360448, vCores:39>
{noformat}
In the above log , observe for the queue default *CUR is <memory:851968, vCores:91>*,
but actually *usedResources=<memory:10240, vCores:91>*. Here, only CPU is matching but
not MEMORY. The CUR calculation is done below formula
#* CUR=  {{clusterResource: <memory:983040, vCores:105>}} *  {{absoluteUsedCapacity(0.8)}}
= {{<memory:851968, vCores:91>}}
#* GAR=  {{clusterResource: <memory:983040, vCores:105>}} *  {{absoluteCapacity(0.5)}}
    = {{ <memory:491520, vCores:52>}}
#* PREEMPTABLE= GAR - CUR = {{<memory:360448, vCores:39>}}
# App-2 request for the containers with {{resource: <memory:1024, vCores:10>}}. So,
the preemption cycle finds that how much memory toBePreempt
{noformat}
2015-06-23 23:21:03,131 | DEBUG | SchedulingMonitor (ProportionalCapacityPreemptionPolicy)
| 1435072863131:  NAME: default CUR: <memory:851968, vCores:91> PEN: <memory:0, vCores:0>
GAR: <memory:491520, vCores:52> NORM: NaN IDEAL_ASSIGNED: <memory:491520, vCores:52>
IDEAL_PREEMPT: <memory:97043, vCores:10> ACTUAL_PREEMPT: <memory:0, vCores:0>
UNTOUCHABLE: <memory:0, vCores:0> PREEMPTABLE: <memory:360448, vCores:39>
{noformat}
Observe that *IDEAL_PREEMPT: <memory:97043, vCores:10>*, but app-2 in queue QueueA required
only 10 CPU resource to be preempt, but memory to be preempt is 97043 but memory sufficiently
available.
Below is the calculations which does IDEAL_PREMPT, 
#* totalPreemptionAllowed = clusterResource: <memory:983040, vCores:105> *  0.1 = <memory:98304,
vCores:10.5>
#* totPreemptionNeeded = CUR - IDEAL_ASSIGNED = CUR: <memory:851968, vCores:91>
#* scalingFactor = Resources.divide(drc, <memory:491520, vCores:52>, <memory:98304,
vCores:10.5>, <memory:851968, vCores:91>);
scalingFactor = 0.114285715
#* toBePreempted = CUR: <memory:851968, vCores:91> *  scalingFactor(0.1139045128455529)
= <memory:97368, vCores:10>
{{resource-to-obtain = <memory:97043, vCores:10>}}

*So the problem is in either of the below steps*
# As [~sunilg] said, usedResources=<memory:10240, vCores:91> but preemption policy calculate
wrongly that current used capacity as {{<memory:851968, vCores:91>}}. This is mainly
becaue preemption policy is using absoluteCapacity for calculating for Current usage which
always gives wrong result for one of the resources in DominantResourceAllocator used. I think,
fraction should not be used which caused problem in DRC(Multi dimentional resources) instead
we should be usedResource from CSQueue.
# Even bypassing above step-1, toBePreempted calculated as resource-to-obtain: <memory:97043,
vCores:10>. When a container marked for preemption, preemption policy subtract the marked
container resources. I.e in the above log, resource-to-obtain will become *<memory:96043,
vCores:0>* since each container memory is <1gb,10cores>. On next container marking,
MEMORY has become DOMINANT and policy tries to fullfil memory i.e 96GB even CPU is fulfilled.
The dominant change i.e scheduler allocates container with CPU dominant, but preemption policy
going for MEMORY dominant causing the problem. This allows kills all the NON-AM containers.

*And don't think that problem is only killing all the NON-AM containers but it continues loop:-(
 i.e  when app-2 starts running containers in QueueA, app-1 ask for container request which
preemption policy kill all the NON-Am containers from app-1. This repeats for ever, and both
applications kills the tasks each others in loop which both applications never completes at
all*

> Too much of preemption activity causing continuos killing of containers across queues
> -------------------------------------------------------------------------------------
>
>                 Key: YARN-3849
>                 URL: https://issues.apache.org/jira/browse/YARN-3849
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Critical
>
> Two queues are used. Each queue has given a capacity of 0.5. Dominant Resource policy
is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking preemption
in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that all containers
other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free space.
But there are some updated demand from the app in QueueA which lost its containers earlier,
and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the apps are completing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message