hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
Date Wed, 01 Jul 2015 18:54:06 GMT

    [ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610815#comment-14610815

Wangda Tan commented on YARN-3849:

Thanks [~sunilg],
Some comments:

Some comments:
1) It seems we don't need useDominantResourceCalculator/rcDefault/rcDominant in TestP..Policy,
pass a boolean parameter to buildPolicy should be enough, you can also overload a buildPolicy
to avoid too much changes.

2) testPreemptionWithVCoreResource seems not correct, root.used != A.used + b.used

3) TestP..PolicyFroNodePartitions:
One comments is wrong:
        + "(1,1:2,n1,x,20,false);" + // 80 * x in n1
        "b\t" // app4 in b
        + "(1,1:2,n2,,80,false)"; // 20 default in n2
It should be 20 * x and 80 default

4) It seems TestP..PolicyFroNodePartitions setting for DRC is missing, could you check?

> Too much of preemption activity causing continuos killing of containers across queues
> -------------------------------------------------------------------------------------
>                 Key: YARN-3849
>                 URL: https://issues.apache.org/jira/browse/YARN-3849
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Sunil G
>            Priority: Critical
>         Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch
> Two queues are used. Each queue has given a capacity of 0.5. Dominant Resource policy
is used.
> 1. An app is submitted in QueueA which is consuming full cluster capacity
> 2. After submitting an app in QueueB, there are some demand  and invoking preemption
in QueueA
> 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that all containers
other than AM is getting killed in QueueA
> 4. Now the app in QueueB is trying to take over cluster with the current free space.
But there are some updated demand from the app in QueueA which lost its containers earlier,
and preemption is kicked in QueueB now.
> Scenario in step 3 and 4 continuously happening in loop. Thus none of the apps are completing.

This message was sent by Atlassian JIRA

View raw message