hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Yang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-9432) Excess reserved containers may exist for a long time after its request has been cancelled or satisfied when multi-nodes enabled
Date Mon, 01 Apr 2019 11:49:04 GMT

     [ https://issues.apache.org/jira/browse/YARN-9432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tao Yang updated YARN-9432:
---------------------------
    Attachment: YARN-9432.001.patch

> Excess reserved containers may exist for a long time after its request has been cancelled
or satisfied when multi-nodes enabled
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-9432
>                 URL: https://issues.apache.org/jira/browse/YARN-9432
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-9432.001.patch
>
>
> Reserved containers may change to be excess after its request has been cancelled or
satisfied, excess reserved containers need to be unreserved quickly to release resource for
others.
> For multi-nodes disabled scenario, excess reserved containers can be quickly released
in next node heartbeat, the calling stack is CapacityScheduler#nodeUpdate -->  CapacityScheduler#allocateContainersToNode
--> CapacityScheduler#allocateContainerOnSingleNode. 
> But for multi-nodes enabled scenario, excess reserved containers have chance to be
released only in allocation process, key phase of the calling stack is LeafQueue#assignContainers
--> LeafQueue#allocateFromReservedContainer. According to this, excess reserved containers
may not be released until its queue has pending request and has chance to be allocated, and
the worst is that excess reserved containers will never be released and keep holding resource
if there is no additional pending request for this queue.
> To solve this problem, my opinion is to directly kill excess reserved containers when
request is satisfied (in FiCaSchedulerApp#apply) or the allocation number of resource-requests/scheduling-requests
is updated to be 0 (in SchedulerApplicationAttempt#updateResourceRequests / SchedulerApplicationAttempt#updateSchedulingRequests).
> Please feel free to give your suggestions. Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message