hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szilard Nemeth (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (YARN-9430) Recovering containers does not check available resources on node
Date Wed, 03 Apr 2019 09:56:01 GMT

     [ https://issues.apache.org/jira/browse/YARN-9430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Szilard Nemeth reassigned YARN-9430:
------------------------------------

    Assignee:     (was: Szilard Nemeth)

> Recovering containers does not check available resources on node
> ----------------------------------------------------------------
>
>                 Key: YARN-9430
>                 URL: https://issues.apache.org/jira/browse/YARN-9430
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Szilard Nemeth
>            Priority: Critical
>
> I have a testcase that checks if some GPU devices gone offline and recovery happens,
only the containers that fit into the node's resources will be recovered. Unfortunately, this
is not the case: RM does not check available resources on node during recovery.
> *Detailed explanation:*
> *Testcase:* 
>  1. There are 2 nodes running NodeManagers
>  2. nvidia-smi is replaced with a fake bash script that reports 2 GPU devices per node,
initially. This means 4 GPU devices in the cluster altogether.
>  3. RM / NM recovery is enabled
>  4. The test starts off a sleep job, requesting 4 containers, 1 GPU device for each (AM
does not request GPUs)
>  5. Before restart, the fake bash script is adjusted to report 1 GPU device per node
(2 in the cluster) after restart.
>  6. Restart is initiated.
>  
> *Expected behavior:* 
>  After restart, only the AM and 2 normal containers should have been started, as there
are only 2 GPU devices in the cluster.
>  
> *Actual behaviour:* 
>  AM + 4 containers are allocated, this is all containers started originally with step
4.
> App id was: 1553977186701_0001
> *Logs*:
>  
> {code:java}
> 2019-03-30 13:22:30,299 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
Processing event for appattempt_1553977186701_0001_000001 of type RECOVER
> 2019-03-30 13:22:30,366 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
Added Application Attempt appattempt_1553977186701_0001_000001 to scheduler from user: systest
>  2019-03-30 13:22:30,366 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
appattempt_1553977186701_0001_000001 is recovering. Skipping notifying ATTEMPT_ADDED
>  2019-03-30 13:22:30,367 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
appattempt_1553977186701_0001_000001 State change from NEW to LAUNCHED on event = RECOVER
> 2019-03-30 13:22:33,257 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
Recovering container [container_e84_1553977186701_0001_01_000001, CreateTime: 1553977260732,
Version: 0, State: RUNNING, Capability: <memory:1024, vCores:1>, Diagnostics: , ExitStatus:
-1000, NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
Recovering container [container_e84_1553977186701_0001_01_000004, CreateTime: 1553977272802,
Version: 0, State: RUNNING, Capability: <memory:1024, vCores:1, yarn.io/gpu: 1>, Diagnostics:
, ExitStatus: -1000, NodeLabelExpression: Priority: 0]
> 2019-03-30 13:22:33,275 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode:
Assigned container container_e84_1553977186701_0001_01_000004 of capacity <memory:1024,
vCores:1, yarn.io/gpu: 1> on host snemeth-gpu-2.vpc.cloudera.com:8041, which has 2 containers,
<memory:2048, vCores:2, yarn.io/gpu: 1> used and <memory:37252, vCores:6> available
after allocation
> 2019-03-30 13:22:33,276 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
Recovering container [container_e84_1553977186701_0001_01_000005, CreateTime: 1553977272803,
Version: 0, State: RUNNING, Capability: <memory:1024, vCores:1, yarn.io/gpu: 1>, Diagnostics:
, ExitStatus: -1000, NodeLabelExpression: Priority: 0]
>  2019-03-30 13:22:33,276 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
Processing container_e84_1553977186701_0001_01_000005 of type RECOVER
>  2019-03-30 13:22:33,276 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e84_1553977186701_0001_01_000005 Container Transitioned from NEW to RUNNING
>  2019-03-30 13:22:33,276 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode:
Assigned container container_e84_1553977186701_0001_01_000005 of capacity <memory:1024,
vCores:1, yarn.io/gpu: 1> on host snemeth-gpu-2.vpc.cloudera.com:8041, which has 3 containers,
<memory:3072, vCores:3, yarn.io/gpu: 2> used and <memory:36228, vCores:5, yarn.io/gpu:
-1> available after allocation
> 2019-03-30 13:22:33,279 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler:
Recovering container [container_e84_1553977186701_0001_01_000003, CreateTime: 1553977272166,
Version: 0, State: RUNNING, Capability: <memory:1024, vCores:1, yarn.io/gpu: 1>, Diagnostics:
, ExitStatus: -1000, NodeLabelExpression: Priority: 0]
>  2019-03-30 13:22:33,280 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
Processing container_e84_1553977186701_0001_01_000003 of type RECOVER
>  2019-03-30 13:22:33,280 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e84_1553977186701_0001_01_000003 Container Transitioned from NEW to RUNNING
>  2019-03-30 13:22:33,280 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
Processing event for application_1553977186701_0001 of type APP_RUNNING_ON_NODE
>  2019-03-30 13:22:33,280 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode:
Assigned container container_e84_1553977186701_0001_01_000003 of capacity <memory:1024,
vCores:1, yarn.io/gpu: 1> on host snemeth-gpu-3.vpc.cloudera.com:8041, which has 2 containers,
<memory:2048, vCores:2, yarn.io/gpu: 2> used and <memory:37252, vCores:6, yarn.io/gpu:
-1> available after allocation
>  2019-03-30 13:22:33,280 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt:
SchedulerAttempt appattempt_1553977186701_0001_000001 is recovering container container_e84_1553977186701_0001_01_000003
> {code}
>  
> There are multiple logs like this:
> {code:java}
> Assigned container container_e84_1553977186701_0001_01_000005 of capacity <memory:1024,
vCores:1, yarn.io/gpu: 1> on host snemeth-gpu-2.vpc.cloudera.com:8041, which has 3 containers,
<memory:3072, vCores:3, yarn.io/gpu: 2> used and <memory:36228, vCores:5, yarn.io/gpu:
-1> available after allocation{code}
> *Note the -1 value for the yarn.io/gpu resource!*
> The issue lies in this method: [https://github.com/apache/hadoop/blob/e40e2d6ad5cbe782c3a067229270738b501ed27e/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java#L179]
> The problem is that method deductUnallocatedResource does not check if the resource of
the container is subtracted from unallocated resource, the unallocated resource remains above
zero.
>  Here is the ResourceManager call hierarchy for the method (from top to bottom):
> {code:java}
> 1. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler#handle
> 2. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler#addNode
> 3. org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler#recoverContainersOnNode
> 4. org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode#recoverContainer
> 5. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode#allocateContainer
> 6. org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode#allocateContainer(org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer,
boolean)
> deduct is called here!{code}
> *Testcase that reproduces the issue:* 
>  *Add this testcase to TestFSSchedulerNode:*
>  
> {code:java}
> @Test
>  public void testRecovery() {
>  RMNode node = createNode();
>  FSSchedulerNode schedulerNode = new FSSchedulerNode(node, false);
> RMContainer container1 = createContainer(Resource.newInstance(4096, 4),
>  null);
>  RMContainer container2 = createContainer(Resource.newInstance(4096, 4),
>  null);
>  
>  schedulerNode.allocateContainer(container1);
>  schedulerNode.containerStarted(container1.getContainerId());
>  schedulerNode.allocateContainer(container2);
>  schedulerNode.containerStarted(container2.getContainerId());
>  assertEquals("All resources of node should have been allocated",
>  nodeResource, schedulerNode.getAllocatedResource());
>  RMContainer container3 = createContainer(Resource.newInstance(1000, 1),
>  null);
>  when(container3.getState()).thenReturn(RMContainerState.NEW);
>  assertEquals("All resources of node should have been allocated",
>  nodeResource, schedulerNode.getAllocatedResource());
>  
>  schedulerNode.recoverContainer(container3);
> assertEquals("No resource should have been unallocated",
>  Resources.none(), schedulerNode.getUnallocatedResource());
>  assertEquals("All resources of node should have been allocated",
>  nodeResource, schedulerNode.getAllocatedResource());
>  }
> {code}
>  
>  
> *Result of testcase:*
> {code:java}
> java.lang.AssertionError: No resource should have been unallocated 
> Expected :<memory:0, vCores:0>
> Actual :<memory:-1000, vCores:-1>{code}
> *IT'S IMMEDIATELY CLEAR THAT NOT ONLY GPU (OR OTHER RESOURCE TYPES), BUT ANY RESOURCES
ARE AFFECTED BY THIS ISSUE!*
>  
> *Possible fix:* 
>  1. A condition needs to be introduced to check if there is enough resources on the node,
we should proceed with the container's recovery only if this is true.
>  2. An error log should be added. For a quick look, this is seemingly enough so no exception
is required, but this needs a more thorough investigation and manual test on cluster!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message