hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Bacsko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-9436) Flaky test testApplicationLifetimeMonitor
Date Wed, 03 Apr 2019 14:58:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808805#comment-16808805
] 

Peter Bacsko commented on YARN-9436:
------------------------------------

Whoah, thanks [~Prabhu Joseph] - yes it's exactly the same. I'm closing this.

> Flaky test testApplicationLifetimeMonitor
> -----------------------------------------
>
>                 Key: YARN-9436
>                 URL: https://issues.apache.org/jira/browse/YARN-9436
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: scheduler, test
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>
> In our test environment, we occasionally encounter this failure:
> {noformat}
> 2019-04-03 12:49:32 [INFO] Running org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor
> 2019-04-03 12:53:08 [ERROR] Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
215.535 s <<< FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor
> 2019-04-03 12:53:08 [ERROR] testApplicationLifetimeMonitor[0](org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor)
 Time elapsed: 34.244 s  <<< FAILURE!
> 2019-04-03 12:53:08 java.lang.AssertionError: Application killed before lifetime value
> 2019-04-03 12:53:08 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestApplicationLifetimeMonitor.testApplicationLifetimeMonitor(TestApplicationLifetimeMonitor.java:218)
> 2019-04-03 12:53:08 
> {noformat}
> The root cause is the condition here:
> {noformat}
>         Assert.assertTrue("Application killed before lifetime value",
>             totalTimeRun > maxLifetime);
> {noformat}
> However, there are two problems with this condition:
>  1. Logically it's not correct. In fact, since the app should be killed after 30 seconds,
one would expect to see {{totalTimeRun = maxLifetime}}. Due to some asynchronicity and rounding,
most of the time {{totalTimeRun}} ends up being 31.
> 2. Sometimes the application is killed fast enough and {{totalTimeRun}} is 30, but this
is correct, because in {{setUpCSQueue}} we set the queue lifetime:
> {noformat}
>     csConf.setMaximumLifetimePerQueue(
>         CapacitySchedulerConfiguration.ROOT + ".default", maxLifetime);
>     csConf.setDefaultLifetimePerQueue(
>         CapacitySchedulerConfiguration.ROOT + ".default", defaultLifetime);
> {noformat}
> A more proper condition is:
> {noformat}
> Assert.assertTrue("Application killed before lifetime value",
>             totalTimeRun >= maxLifetime);
> {noformat}
> The assertion message in the next line is also misleading:
> {noformat}
>         Assert.assertTrue(
>             "Application killed before lifetime value " + totalTimeRun,
>             totalTimeRun < maxLifetime + 10L);
> {noformat}
> If it false, it means that the application is killed _after_ 40 seconds, which exceeds
both the app's lifetime (40s) and that of the queue (30s).
> {noformat}
>         Assert.assertTrue(
>             "Application killed after queue/app lifetime value: " + totalTimeRun,
>             totalTimeRun < maxLifetime + 10L);
> {noformat}
> We can be even be stricter, since we expect a kill almost immediately after 30 seconds:
> {noformat}
>         Assert.assertTrue(
>             "Application killed too late: " + totalTimeRun,
>             totalTimeRun < maxLifetime + 2L);
> {noformat}
> where we allow a 2 second tolerance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message