hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4686) MiniYARNCluster.start() returns before cluster is completely started
Date Thu, 17 Mar 2016 19:25:33 GMT

    [ https://issues.apache.org/jira/browse/YARN-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200203#comment-15200203

Jason Lowe commented on YARN-4686:

bq. Still interested in if Jason Lowe or Karthik Kambatla have comments, especially about
removal of the (extra) threads in startResourceManager and serviceStart methods.

The thread removal is key, IMHO.  MiniYARNCluster was a source of flaky tests because those
threads allowed the mini cluster to return from its start method before its subcomponents
completed their start methods.  That means tests that assumed the cluster was started after
cluster.start() were making a bad assumption.  Removing these threads means the cluster really
is started after the start method, assuming the RM and NM start methods correctly return only
after they have started.

+1 patch looks good to me.  I'm OK either way on the blind or checked transition to active
since it's a fast no-op in the non-HA case.  It will generate an extra "Already in active
state" info message in the test logs but is otherwise benign.

> MiniYARNCluster.start() returns before cluster is completely started
> --------------------------------------------------------------------
>                 Key: YARN-4686
>                 URL: https://issues.apache.org/jira/browse/YARN-4686
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: test
>            Reporter: Rohith Sharma K S
>            Assignee: Eric Badger
>         Attachments: MAPREDUCE-6507.001.patch, YARN-4686.001.patch, YARN-4686.002.patch,
YARN-4686.003.patch, YARN-4686.004.patch, YARN-4686.005.patch, YARN-4686.006.patch
> TestRMNMInfo fails intermittently. Below is trace for the failure
> {noformat}
> testRMNMInfo(org.apache.hadoop.mapreduce.v2.TestRMNMInfo)  Time elapsed: 0.28 sec  <<<
> java.lang.AssertionError: Unexpected number of live nodes: expected:<4> but was:<3>
> 	at org.junit.Assert.fail(Assert.java:88)
> 	at org.junit.Assert.failNotEquals(Assert.java:743)
> 	at org.junit.Assert.assertEquals(Assert.java:118)
> 	at org.junit.Assert.assertEquals(Assert.java:555)
> 	at org.apache.hadoop.mapreduce.v2.TestRMNMInfo.testRMNMInfo(TestRMNMInfo.java:111)
> {noformat}

This message was sent by Atlassian JIRA

View raw message