hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tao Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9598) Make reservation work well when multi-node enabled
Date Mon, 10 Jun 2019 05:29:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859709#comment-16859709

Tao Yang commented on YARN-9598:

Thanks [~jutia] for the
for this, if re-reservation is disabled, the shouldAllocOrReserveNewContainer may return false
in most cases, and thus even scheduler has a change to look up other candidates, it may not
assign containers.
IIUIC, shouldAllocOrReserveNewContainer variable is used for reserving more resource than
required, which I think it's not only unnecessary (we can see and choose available resources
from all nodes) but also harmful in multi-nodes scenarios, this logic can make a low-priority
app get much more resources than needs which won't be released util all the needs satisfied,
it's inefficient for the cluster utilization and can block requirements from high-priority
apps. On another hand, disable re-reservation can only make the scheduler skip reserving the
same container repeatedly and try to allocate on other nodes, it won't affect normal scheduling
for this app and later apps. Thoughts?
I'm wondering why we just handle this case like sing-node, and change th logic in CapacityScheduler#allocateContainersOnMultiNodes
like below
[~cheersyang] and I have discussed about moving allocateFromReservedContainer ahead to avoid
trying to allocate from reserved containers many times in once scheduling for YARN-9432, and
chose not to do that after considering that won't be a tiny change and should affect current
scheduling process, just fix the problem without changing more, same as this issue.

> Make reservation work well when multi-node enabled
> --------------------------------------------------
>                 Key: YARN-9598
>                 URL: https://issues.apache.org/jira/browse/YARN-9598
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, image-2019-06-10-11-37-44-975.png
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated on the
same node and break the scheduling for this app and later apps. I think re-reservation in
unnecessary and we can replace it with LOCALITY_SKIPPED to let scheduler have a chance to
look up follow candidates for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in LeafQueue#allocateFromReservedContainer.
Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of all nodes
when calling FiCaSchedulerApp#assignContainers, otherwise later scheduler may generate a reservation-fulfilled
proposal on another node, which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be null even
if it's just skipped, it will break the normal scheduling process for this leaf queue because
of the if clause in LeafQueue#assignContainers: "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates in RegularContainerAllocator#allocate,
otherwise scheduler may generate allocation or reservation proposal on these node which will
always be rejected in FiCaScheduler#commonCheckContainerAllocation.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message