hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
Date Fri, 04 Sep 2015 22:01:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731505#comment-14731505

Wangda Tan commented on YARN-1651:

Hi Meng,
Thanks for comments:

bq. We probably need to address this properly in the JIRA that tracks container resource increase
roll back. (I think Container resource increase expiration should be tracked as a Scheduler
Event, e.g., SchedulerEventType.CONTAINER_INCREASE_EXPIRE)
I think we can do that either via adding CONTAINER_INCREASE_EXPIRE or directly call decrease
of scheduler from RMContainer, I'm not sure which one is better, let's figure it out when
doing it.

bq. It seems that this function throws exception whenever there is a duplicated id. Shall
we handle the case where if there are both increase and decrease requests for the same id,
we can ignore the increase but keep the decrease request?
I have thought about this before, I think it's hard to decide when two request with same containerId
but different target resource exists, which one will be chosen. And it's not like an expected
allocate request as well. So I prefer to reject both.

bq. Will it be better to combine all sanity checks into one function

bq. For validateIncreaseDecreaseRequest, we don't check minimum allocation now, is it intended?
Yes it's intended, because we will normalize it later, so no need to throw exception.

bq. This function is used by both pullNewlyIncreasedContainers(), and pullNewlyDecreasedContainers().
Why do we need to call updateContainerAndNMToken for decreased containers? It also unnecessarily
send a ACQUIRE_UPDATED_CONTAINER event for every decreased container?
This is majorly makes logic correct and consistent, we don't use the container token for now,
but I think we should make it updated before return to app. Unless we have any performance
issue of doing this, I prefer to keep existing behavior.

bq. We should probably check null before adding updatedContainer?
It seems no need to do the null check here. When it becomes null? I prefer to keep it as-is
and it will throw NPE if any fatal issue happens.

bq. RMNodeImpl.pullNewlyIncreasedContainers()

bq. AppSchedulingInfo#notifyContainerStopped not being used.
Removed, we handled this in LeafQueue#completedContainer.

bq. I think the following is a typo, should be if (cannotAllocateAnything), right?
Correct, fixed.

bq. Not sure if I understand the logic. Why only break when node.getReservedContainer() ==
null? Shouldn't we break out of the loop here no matter what?
Nice catch! I fixed this, we should break when we allocated or reserved anything.

bq. I think earlier in the allocateIncreaseRequest() function, if a new increase is successfully
allocated, application.increaseContainer(increaseRequest) will have removed the increase request
Another nice catch! Yes we should already handled it in application.increaseContainer

bq. RMContainerImpl...Shouldn't it be changed to...
Yes, it should do as you said, updated.

bq. Also, is container.containerIncreased really needed?
It's needed when we don't know if a acquired event is for an increasedContainer or decreasedContainer,
added isIncreaseContainer to acquire event (now it's RMContainerUpdatesAcquiredEvent). And
removed RMContainerImpl.containerIncreased.

Typos: fixed.

> CapacityScheduler side changes to support increase/decrease container resource.
> -------------------------------------------------------------------------------
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch, YARN-1651-2.YARN-1197.patch

This message was sent by Atlassian JIRA

View raw message