hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zian Chen (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (YARN-8138) Add unit test to validate queue priority preemption works under node partition.
Date Wed, 11 Apr 2018 22:44:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zian Chen updated YARN-8138:
----------------------------
    Comment: was deleted

(was: Investigated this issue and wrote a UT to reproduce it. According to the UT. the conclusion
is the preemption happened after application 3 got submitted.  But not happening as expected
as the test scenario presented. There are several issues we need to clarify here.
 # When we set memory size for containers, we need to set them as multiple of 1024 MB, otherwise,
the scheduler will convert them into the nearest size which is bigger than the requested size
which is multiple of 1024 MB. For example app3 had am container request of 750MB, instead,
it will get 1024 MB as the container size.
 # According to the log, preemption seems not happened. but actually, it happened with a long
time delay(1 minute probably), the reason is when we set "yarn.scheduler.capacity.ordering-policy.priority-utilization.underutilized-preemption.reserved-container-delay-ms"
property, the reserved container will not be allocated before we hit this timeout, which leads
preemption will delay more before we hit this timeout.
 # Although we got preemption happened, we will not expect A3 to be able to launch all its
requested containers. Because the amount of resource A3 can get should limit by minimum guaranteed
resource for the queue the application submitted to. In this case, we will only expect two
containers to preempt since Queue B will reach its minimum guaranteed resource (50% of the
cluster resource) after two containers preempt from Queue A.

So my suggestion is recheck the test scenario with those issues mentioned above and change
settings properly, and the test should pass.

 

[~leftnoteasy] , could you share your opinions as well? Thanks)

> Add unit test to validate queue priority preemption works under node partition.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-8138
>                 URL: https://issues.apache.org/jira/browse/YARN-8138
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Charan Hebri
>            Assignee: Zian Chen
>            Priority: Minor
>         Attachments: YARN-8138.001.patch, YARN-8138.002.patch
>
>
> There seems to be an issue with pre-emption when using node labels with queue priority.
> Test configuration:
> queue A (capacity=50, priority=1)
> queue B (capacity=50, priority=2)
> both have accessible-node-labels set to x
> A.accessible-node-labels.x.capacity = 50
> B.accessible-node-labels.x.capacity = 50
> Along with this pre-emption related properties have been set.
> Test steps:
>  - Set NM memory = 6000MB and containerMemory = 750MB
>  - Submit an application A1 to B, with am-container = container = (6000-750-1500), no.
of containers = 2
>  - Submit an application A2 to A, with am-container = 750, container = 1500, no of containers
= (NUM_NM-1)
>  - Kill application A1
>  - Submit an application A3 to B with am-container=container=5000, no. of containers=3
>  - Expectation is that containers are pre-empted from application A2 to A3 but there
is no container pre-emption happening
> Container pre-emption is stuck with the message in the RM log,
> {noformat}
> 2018-02-02 11:41:36,974 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2673))
- Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler (CapacityScheduler.java:allocateContainerOnSingleNode(1391))
- Trying to fulfill reservation for application application_1517571510094_0003 on node: XXXXXXXXXX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97))
- Reserved container application=application_1517571510094_0003 resource=<memory:3072,
vCores:1> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
cluster=<memory:18000, vCores:3>
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2673))
- Allocation proposal accepted
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler (CapacityScheduler.java:allocateContainerOnSingleNode(1391))
- Trying to fulfill reservation for application application_1517571510094_0003 on node: XXXXXXXXXX:25454
> 2018-02-02 11:41:36,984 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97))
- Reserved container application=application_1517571510094_0003 resource=<memory:3072,
vCores:1> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
cluster=<memory:18000, vCores:3>
> 2018-02-02 11:41:36,984 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2673))
- Allocation proposal accepted
> 2018-02-02 11:41:36,994 INFO capacity.CapacityScheduler (CapacityScheduler.java:allocateContainerOnSingleNode(1391))
- Trying to fulfill reservation for application application_1517571510094_0003 on node: XXXXXXXXXX:25454
> 2018-02-02 11:41:36,995 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(97))
- Reserved container application=application_1517571510094_0003 resource=<memory:3072,
vCores:1> queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@3f04848e
cluster=<memory:18000, vCores:3>{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message