hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "MENG DING (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
Date Thu, 03 Sep 2015 20:20:46 GMT

    [ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729706#comment-14729706
] 

MENG DING commented on YARN-1651:
---------------------------------

Hi, [~leftnoteasy]

I think it is fine to reuse Expire for now for container increase expiration. We probably
need to address this properly in the JIRA that tracks container resource increase roll back.
(I think Container resource increase expiration should be tracked as a Scheduler Event, e.g.,
SchedulerEventType.CONTAINER_INCREASE_EXPIRE)

I have a few more comments or questions regarding the patch:

* Regarding sanity checks:
** The following functions can be removed? {{ApplicationMaster.checkDuplicatedIncreaseDecreaseRequest()}}
** About {{RMServerUtils.checkDuplicatedIncreaseDecreaseRequest()}}. It seems that this function
throws exception whenever there is a duplicated id. Shall we handle the case where if there
are both increase and decrease requests for the same id, we can ignore the increase but keep
the decrease request?
** Will it be better to combine all sanity checks into one function, e.g., {{validateIncreaseDecreaseRequest(List<ContainerResourceChangeRequest>
incRequests, List<ContainerResourceChangeRequest> decRequests)}}, such that it will
check both duplicated IDs, and the resource validity for increase and decrease requests? 
** For {{validateIncreaseDecreaseRequest}}, we don't check minimum allocation now, is it intended?
I see that later on you normalize the request so that it will be at least minimum allocation.
Just want to confirm.

* For {{SchedulerApplicationAttempt.pullNewlyUpdatedContainers}}. 
** This function is used by both pullNewlyIncreasedContainers(), and pullNewlyDecreasedContainers().
Why do we need to call {{updateContainerAndNMToken}} for decreased containers?  It also unnecessarily
send a ACQUIRE_UPDATED_CONTAINER event for every decreased container?
** We should probably check null before adding updatedContainer?
{code:title=pullNewlyUpdatedContainers}
      Container updatedContainer = updateContainerAndNMToken(rmContainer, false);
      returnContainerList.add(updatedContainer);
{code}

* It seems {{RMNodeImpl.pullNewlyIncreasedContainers()}} is empty?

* The following function doesn't seem to be used?
{code:title=AppSchedulingInfo}
  public synchronized void notifyContainerStopped(RMContainer rmContainer) {
    // remove from pending increase request map if it exists
    removeIncreaseRequest(rmContainer.getAllocatedNode(),
        rmContainer.getAllocatedPriority(), rmContainer.getContainerId());
  }
{code}

* In {{IncreaseContainerAllocator.assignContainers}}:
** I think the following is a typo, should be {{if (cannotAllocateAnything)}}, right?
{code}
          if (shouldUnreserve) {
            LOG.debug("We cannot allocate anything because of low headroom, "
                + "headroom=" + resourceLimits.getHeadroom());
          }
{code}
** Not sure if I understand the logic. Why only break when node.getReservedContainer() ==
null? Shouldn't we break out of the loop here no matter what?
{code}
       while (iter.hasNext()) {
          ...
          ...
          // Try to allocate the increase request
          assigned = allocateIncreaseRequest(node, increaseRequest);
          if (node.getReservedContainer() == null) {
            // if it's not a reserved increase request, we will record
            // priority/containerId so that we can remove the request later
            increasedContainerPriority = priority;
            increasedContainerId = rmContainer.getContainerId();
            break;
          }
       }  
{code}
** Is the following needed? 
 {code}
      if (increasedContainerId != null) {
        // If we increased (not reserved) a new increase request, we should
        // remove it from request map.
        application.removeIncreaseRequest(nodeId, increasedContainerPriority,
            increasedContainerId);
      }
{code}
I think earlier in the {{allocateIncreaseRequest()}} function, if a new increase is successfully
allocated, {{application.increaseContainer(increaseRequest)}} will have removed the increase
request already?
* In {{RMContainerImpl.java}}
IIUC, {{containerIncreased}} indicates that a increase is done in scheduler, and {{containerIncreasedAndAcquired}}
indicates that a increase has been acquired by AM. 
If so, then in {{NMReportedContainerChangeIsDoneTransition}}
{code}
    public void transition(RMContainerImpl container, RMContainerEvent event) {
      if (container.containerIncreased) {
        // If container is increased but not acquired by AM, we will start
        // containerAllocationExpirer for this container in this transition.
        container.containerAllocationExpirer.unregister(event.getContainerId());
        container.containerIncreasedAndAcquired = false;
      }
    }
{code}
Shouldn't it be changed to:
{code}
    public void transition(RMContainerImpl container, RMContainerEvent event) {
      if (container.containerIncreasedAndAcquired ) {                                 <=======================
        // If container is increased but not acquired by AM, we will start
        // containerAllocationExpirer for this container in this transition.
        container.containerAllocationExpirer.unregister(event.getContainerId());
        container.containerIncreasedAndAcquired = false;
        container.containerIncreased = false;                                            
 <=======================
      }
    }
{code}
Also, is *container.containerIncreased* really needed?

* A few typos:
*updateNodeHeartbeatResposneForContainersDecreasing*
*cancelInceaseReservation*


> CapacityScheduler side changes to support increase/decrease container resource.
> -------------------------------------------------------------------------------
>
>                 Key: YARN-1651
>                 URL: https://issues.apache.org/jira/browse/YARN-1651
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager, scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-1651-1.YARN-1197.patch, YARN-1651-2.YARN-1197.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message