hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6920) Fix TestNMClient failure due to YARN-6706
Date Thu, 03 Aug 2017 19:14:01 GMT

    [ https://issues.apache.org/jira/browse/YARN-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113318#comment-16113318
] 

Arun Suresh commented on YARN-6920:
-----------------------------------

Without this patch, the test case would time out (the testcase timeout is 200s) and you should
see the following in the logs:
{noformat}
....
2017-08-03 11:58:04,094 INFO  container.ContainerImpl (ContainerImpl.java:transition(1382))
- Relaunching Container [container_1501786677410_0001_01_000002] for re-initialization !!
2017-08-03 11:58:04,094 INFO  container.ContainerImpl (ContainerImpl.java:handle(1691)) -
Container container_1501786677410_0001_01_000002 transitioned from REINITIALIZING to SCHEDULED
2017-08-03 11:58:04,094 WARN  scheduler.ContainerScheduler (ContainerScheduler.java:pickOpportunisticContainersToKill(384))
- There are no sufficient resources to start guaranteed [container_1501786677410_0001_01_000002]at
the moment. Opportunistic containers are in the process ofbeing killed to make room.
....
{noformat}

With the patch, if the test does fail for you - it might be due to some other assertion failure,
not a timeout. And you should not see the above call to {{pickOpportunisticContainersToKill()}}
in the logs.

Reason:
During container re-initialization, the container process is killed and re-launched. This
transfers control back to the ContainerScheduler, which, after YARN-6706 always checks to
see if resources are available to launch the container, irrespective of whether queuing is
turned on or off. Un-fortunately, when the container was killed for re-initialization, we
had neglected to subtract (reclaim) the containers resources from the utilization tracker,
due to which the afore mentioned check fails on re launch. This patch makes sure the resources
are reclaimed.


> Fix TestNMClient failure due to YARN-6706
> -----------------------------------------
>
>                 Key: YARN-6920
>                 URL: https://issues.apache.org/jira/browse/YARN-6920
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-6920.001.patch, YARN-6920.002.patch, YARN-6920.003.patch, YARN-6920.004.patch
>
>
> Looks like {{TestNMClient}} has been failing for a while. Opening this JIRA to track
the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message