hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bibin A Chundatt (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4618) RM Stops allocating containers if large number of pending containers
Date Fri, 22 Jan 2016 05:41:39 GMT

     [ https://issues.apache.org/jira/browse/YARN-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bibin A Chundatt updated YARN-4618:
-----------------------------------
    Description: 
In  one of the test found that when RM is having so many pending container request to be served
RM Stops assigning containers.

Cluster simulated is with 100 TB 

Root total = 60k containers = 
Queue 1 = 30k containers = 1328800000 MB
Queue 2 = 30k containers = 1428800000 MB
Each container request is with 40GB. 


{{ParentQueue#assignContainers}} is as below
{noformat}
    // Check if this queue need more resource, simply skip allocation if this
    // queue doesn't need more resources.
    if (!super.hasPendingResourceRequest(node.getPartition(),
        clusterResource, schedulingMode)) {
      if (LOG.isDebugEnabled()) {
        LOG.debug("Skip this queue=" + getQueuePath()
            + ", because it doesn't need more resource, schedulingMode="
            + schedulingMode.name() + " node-partition=" + node.getPartition());
      }
      return CSAssignment.NULL_ASSIGNMENT;
    }
{noformat}

When the pending resource > MAX VALUE and become *negative*  {{- 167XXXXXXX MB}} and always
NULL_ASSIGNMENT is return.

Tools used to test SLS.

For checking pendingResource request we should first check any pending containers (from getMetrics())
are there to be served. If pending containers are available then return true else consider
other check for increase request.

Thoughts ??





  was:
In  one of the test found that when RM is having so many pending container request to be served
RM Stops assigning containers.

Cluster simulated is with 100 TB 

Root total = 600k containers = 
Queue 1 = 300k containers = 1328800000 MB
Queue 2 = 300k containers = 1428800000 MB
Each container request is with 4GB. 


{{ParentQueue#assignContainers}} is as below
{noformat}
    // Check if this queue need more resource, simply skip allocation if this
    // queue doesn't need more resources.
    if (!super.hasPendingResourceRequest(node.getPartition(),
        clusterResource, schedulingMode)) {
      if (LOG.isDebugEnabled()) {
        LOG.debug("Skip this queue=" + getQueuePath()
            + ", because it doesn't need more resource, schedulingMode="
            + schedulingMode.name() + " node-partition=" + node.getPartition());
      }
      return CSAssignment.NULL_ASSIGNMENT;
    }
{noformat}

When the pending resource > MAX VALUE and become *negative*  {{- 167XXXXXXX MB}} and always
NULL_ASSIGNMENT is return.

Tools used to test SLS.

For checking pendingResource request we should first check any pending containers (from getMetrics())
are there to be served. If pending containers are available then return true else consider
other check for increase request.

Thoughts ??






> RM Stops allocating containers if large number of pending containers
> --------------------------------------------------------------------
>
>                 Key: YARN-4618
>                 URL: https://issues.apache.org/jira/browse/YARN-4618
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>
> In  one of the test found that when RM is having so many pending container request to
be served RM Stops assigning containers.
> Cluster simulated is with 100 TB 
> Root total = 60k containers = 
> Queue 1 = 30k containers = 1328800000 MB
> Queue 2 = 30k containers = 1428800000 MB
> Each container request is with 40GB. 
> {{ParentQueue#assignContainers}} is as below
> {noformat}
>     // Check if this queue need more resource, simply skip allocation if this
>     // queue doesn't need more resources.
>     if (!super.hasPendingResourceRequest(node.getPartition(),
>         clusterResource, schedulingMode)) {
>       if (LOG.isDebugEnabled()) {
>         LOG.debug("Skip this queue=" + getQueuePath()
>             + ", because it doesn't need more resource, schedulingMode="
>             + schedulingMode.name() + " node-partition=" + node.getPartition());
>       }
>       return CSAssignment.NULL_ASSIGNMENT;
>     }
> {noformat}
> When the pending resource > MAX VALUE and become *negative*  {{- 167XXXXXXX MB}} and
always NULL_ASSIGNMENT is return.
> Tools used to test SLS.
> For checking pendingResource request we should first check any pending containers (from
getMetrics()) are there to be served. If pending containers are available then return true
else consider other check for increase request.
> Thoughts ??



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message