hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From acmur...@apache.org
Subject svn commit: r1567118 [2/2] - /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html
Date Tue, 11 Feb 2014 13:32:08 GMT

Modified: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html?rev=1567118&r1=1567117&r2=1567118&view=diff
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html (original)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html Tue Feb 11 13:32:07 2014
@@ -1,3 +1,2953 @@
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<title>Hadoop  2.3.0 Release Notes</title>
+<STYLE type="text/css">
+	H1 {font-family: sans-serif}
+	H2 {font-family: sans-serif; margin-left: 7mm}
+	TABLE {margin-left: 7mm}
+<h1>Hadoop  2.3.0 Release Notes</h1>
+These release notes include new developer and user-facing incompatibilities, features, and major improvements. 
+<a name="changes"/>
+<h2>Changes since Hadoop 2.2.0</h2>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1642">YARN-1642</a>.
+     Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>RMDTRenewer#getRMClient should use ClientRMProxy</b><br>
+     <blockquote>RMDTRenewer#getRMClient gets a proxy to the RM in the conf directly instead of going through ClientRMProxy. 
+      final YarnRPC rpc = YarnRPC.create(conf);
+      return (ApplicationClientProtocol)rpc.getProxy(ApplicationClientProtocol.class, addr, conf);
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1630">YARN-1630</a>.
+     Major bug reported by Aditya Acharya and fixed by Aditya Acharya (client)<br>
+     <b>Introduce timeout for async polling operations in YarnClientImpl</b><br>
+     <blockquote>I ran an MR2 application that would have been long running, and killed it programmatically using a YarnClient. The app was killed, but the client hung forever. The message that I saw, which spammed the logs, was "Watiting for application application_1389036507624_0018 to be killed."
+The RM log indicated that the app had indeed transitioned from RUNNING to KILLED, but for some reason future responses to the RPC to kill the application did not indicate that the app had been terminated.
+I tracked this down to YarnClientImpl.java, and though I was unable to reproduce the bug, I wrote a patch to introduce a bound on the number of times that YarnClientImpl retries the RPC before giving up.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1629">YARN-1629</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>IndexOutOfBoundsException in Fair Scheduler MaxRunningAppsEnforcer</b><br>
+     <blockquote>This can occur when the second-to-last app in a queue's pending app list is made runnable.  The app is pulled out from under the iterator. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1628">YARN-1628</a>.
+     Major bug reported by Mit Desai and fixed by Vinod Kumar Vavilapalli <br>
+     <b>TestContainerManagerSecurity fails on trunk</b><br>
+     <blockquote>The Test fails with the following error
+java.lang.IllegalArgumentException: java.net.UnknownHostException: InvalidHost
+	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
+	at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145)
+	at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136)
+	at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testNMTokens(TestContainerManagerSecurity.java:253)
+	at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:144)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1624">YARN-1624</a>.
+     Major bug reported by Aditya Acharya and fixed by Aditya Acharya (scheduler)<br>
+     <b>QueuePlacementPolicy format is not easily readable via a JAXB parser</b><br>
+     <blockquote>The current format for specifying queue placement rules in the fair scheduler allocations file does not lend itself to easy parsing via a JAXB parser. In particular, relying on the tag name to encode information about which rule to use makes it very difficult for an xsd-based JAXB parser to preserve the order of the rules, which is essential.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1623">YARN-1623</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Include queue name in RegisterApplicationMasterResponse</b><br>
+     <blockquote>This provides the YARN change necessary to support MAPREDUCE-5732.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1618">YARN-1618</a>.
+     Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Fix invalid RMApp transition from NEW to FINAL_SAVING</b><br>
+     <blockquote>YARN-891 augments the RMStateStore to store information on completed applications. In the process, it adds transitions from NEW to FINAL_SAVING. This leads to the RM trying to update entries in the state-store that do not exist. On ZKRMStateStore, this leads to the RM crashing. 
+Previous description:
+ZKRMStateStore fails to handle updates to znodes that don't exist. For instance, this can happen when an app transitions from NEW to FINAL_SAVING. In these cases, the store should create the missing znode and handle the update.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1616">YARN-1616</a>.
+     Trivial improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>RMFatalEventDispatcher should log the cause of the event</b><br>
+     <blockquote>RMFatalEventDispatcher#handle() logs the receipt of an event and its type, but leaves out the cause. The cause captures why the event was raised and would help debugging issues. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1608">YARN-1608</a>.
+     Trivial bug reported by Karthik Kambatla and fixed by Karthik Kambatla (nodemanager)<br>
+     <b>LinuxContainerExecutor has a few DEBUG messages at INFO level</b><br>
+     <blockquote>LCE has a few INFO level log messages meant to be at debug level. In fact, they are logged both at INFO and DEBUG. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1607">YARN-1607</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
+     <b>TestRM expects the capacity scheduler</b><br>
+     <blockquote>We should either explicitly set the Capacity Scheduler or make it scheduler-agnostic</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1603">YARN-1603</a>.
+     Trivial bug reported by Zhijie Shen and fixed by Zhijie Shen <br>
+     <b>Remove two *.orig files which were unexpectedly committed</b><br>
+     <blockquote>FairScheduler.java.orig and TestFifoScheduler.java.orig</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1601">YARN-1601</a>.
+     Major bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
+     <b>3rd party JARs are missing from hadoop-dist output</b><br>
+     <blockquote>With the build changes of YARN-888 we are leaving out all 3rd party JArs used directly by YARN under /share/hadoop/yarn/lib/.
+We did not notice this when running minicluster because they all happen to be in the classpath from hadoop-common and hadoop-yarn.
+As 3d party JARs are not 'public' interfaces we cannot rely on them being provided to yarn by common and hdfs. (ie if common and hdfs stop using a 3rd party dependency that yarn uses this would break yarn if yarn does not pull that dependency explicitly).
+Also, this will break bigtop hadoop build when they move to use branch-2 as they expect to find jars in /share/hadoop/yarn/lib/</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1600">YARN-1600</a>.
+     Blocker bug reported by Jason Lowe and fixed by Haohui Mai (resourcemanager)<br>
+     <b>RM does not startup when security is enabled without spnego configured</b><br>
+     <blockquote>We have a custom auth filter in front of our various UI pages that handles user authentication.  However currently the RM assumes that if security is enabled then the user must have configured spnego as well for the RM web pages which is not true in our case.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1598">YARN-1598</a>.
+     Critical sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (client , resourcemanager)<br>
+     <b>HA-related rmadmin commands don't work on a secure cluster</b><br>
+     <blockquote>The HA-related commands like -getServiceState -checkHealth etc. don't work in a secure cluster.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1579">YARN-1579</a>.
+     Trivial sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>ActiveRMInfoProto fields should be optional</b><br>
+     <blockquote>Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields instead of required fields. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1575">YARN-1575</a>.
+     Critical sub-task reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
+     <b>Public localizer crashes with "Localized unkown resource"</b><br>
+     <blockquote>The public localizer can crash with the error:
+2014-01-08 14:11:43,212 [Thread-467] ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
+2014-01-08 14:11:43,212 [Thread-467] INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1574">YARN-1574</a>.
+     Blocker sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
+     <b>RMDispatcher should be reset on transition to standby</b><br>
+     <blockquote>Currently, we move rmDispatcher out of ActiveService. But we still register the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when we initiate the ActiveService.
+Almost every time when we transit RM from Active to Standby,  we need to initiate the ActiveService. That means we will register the same event Dispatcher which will cause the same event will be handled several times.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1573">YARN-1573</a>.
+     Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>ZK store should use a private password for root-node-acls</b><br>
+     <blockquote>Currently, when HA is enabled, ZK store uses cluster-timestamp as the password for root node ACLs to give the Active RM exclusive access to the store. A more private value like a random number might be better. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1568">YARN-1568</a>.
+     Trivial task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Rename clusterid to clusterId in ActiveRMInfoProto </b><br>
+     <blockquote>YARN-1029 introduces ActiveRMInfoProto - just realized it defines a field clusterid, which is inconsistent with other fields. Better to fix it immediately than leave the inconsistency. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1567">YARN-1567</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1560">YARN-1560</a>.
+     Major test reported by Ted Yu and fixed by Ted Yu <br>
+     <b>TestYarnClient#testAMMRTokens fails with null AMRM token</b><br>
+     <blockquote>The following can be reproduced locally:
+testAMMRTokens(org.apache.hadoop.yarn.client.api.impl.TestYarnClient)  Time elapsed: 3.341 sec  &lt;&lt;&lt; FAILURE!
+junit.framework.AssertionFailedError: null
+  at junit.framework.Assert.fail(Assert.java:48)
+  at junit.framework.Assert.assertTrue(Assert.java:20)
+  at junit.framework.Assert.assertNotNull(Assert.java:218)
+  at junit.framework.Assert.assertNotNull(Assert.java:211)
+  at org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testAMMRTokens(TestYarnClient.java:382)
+This test didn't appear in https://builds.apache.org/job/Hadoop-Yarn-trunk/442/consoleFull</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1559">YARN-1559</a>.
+     Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Race between ServerRMProxy and ClientRMProxy setting RMProxy#INSTANCE</b><br>
+     <blockquote>RMProxy#INSTANCE is a non-final static field and both ServerRMProxy and ClientRMProxy set it. This leads to races as witnessed on - YARN-1482.
+Sample trace:
+java.lang.IllegalArgumentException: RM does not support this client protocol
+        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
+        at org.apache.hadoop.yarn.client.ClientRMProxy.checkAllowedProtocols(ClientRMProxy.java:119)
+        at org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider.init(ConfiguredRMFailoverProxyProvider.java:58)
+        at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:158)
+        at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:88)
+        at org.apache.hadoop.yarn.server.api.ServerRMProxy.createRMProxy(ServerRMProxy.java:56)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1549">YARN-1549</a>.
+     Major test reported by Ted Yu and fixed by haosdent <br>
+     <b>TestUnmanagedAMLauncher#testDSShell fails in trunk</b><br>
+     <blockquote>The following error is reproducible:
+testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher)  Time elapsed: 14.911 sec  &lt;&lt;&lt; ERROR!
+java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=RUNNING, ExpectedStates=FINISHED,FAILED,KILLED
+	at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447)
+	at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352)
+	at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:147)
+See https://builds.apache.org/job/Hadoop-Yarn-trunk/435</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1541">YARN-1541</a>.
+     Major bug reported by Jian He and fixed by Jian He <br>
+     <b>Invalidate AM Host/Port when app attempt is done so that in the mean-while client doesn&#8217;t get wrong information.</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1527">YARN-1527</a>.
+     Trivial bug reported by Jian He and fixed by Akira AJISAKA <br>
+     <b>yarn rmadmin command prints wrong usage info:</b><br>
+     <blockquote>The usage should be: yarn rmadmin, instead of java RMAdmin, and the -refreshQueues should be in the second line.
+{code} Usage: java RMAdmin   -refreshQueues 
+   -refreshNodes 
+   -refreshSuperUserGroupsConfiguration 
+   -refreshUserToGroupsMappings 
+   -refreshAdminAcls 
+   -refreshServiceAcl 
+   -getGroups [username]
+   -help [cmd]
+   -transitionToActive &lt;serviceId&gt;
+   -transitionToStandby &lt;serviceId&gt;
+   -failover [--forcefence] [--forceactive] &lt;serviceId&gt; &lt;serviceId&gt;
+   -getServiceState &lt;serviceId&gt;
+   -checkHealth &lt;serviceId&gt;
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1523">YARN-1523</a>.
+     Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
+     <b>Use StandbyException instead of RMNotYetReadyException</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1522">YARN-1522</a>.
+     Major bug reported by Liyin Liang and fixed by Liyin Liang <br>
+     <b>TestApplicationCleanup.testAppCleanup occasionally fails</b><br>
+     <blockquote>TestApplicationCleanup is occasionally failing with the error:
+Test set: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
+Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.215 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
+testAppCleanup(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup) Time elapsed: 5.555 sec &lt;&lt;&lt; FAILURE!
+junit.framework.AssertionFailedError: expected:&lt;1&gt; but was:&lt;0&gt;
+at org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup.testAppCleanup(TestApplicationCleanup.java:119)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1505">YARN-1505</a>.
+     Blocker bug reported by Xuan Gong and fixed by Xuan Gong <br>
+     <b>WebAppProxyServer should not set localhost as YarnConfiguration.PROXY_ADDRESS by itself</b><br>
+     <blockquote>At WebAppProxyServer::startServer(), it will set up  YarnConfiguration.PROXY_ADDRESS to localhost:9099 by itself. So, no matter what is the value we set YarnConfiguration.PROXY_ADDRESS in configuration, the proxyserver will bind to localhost:9099</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1491">YARN-1491</a>.
+     Trivial bug reported by Jonathan Eagles and fixed by Chen He <br>
+     <b>Upgrade JUnit3 TestCase to JUnit 4</b><br>
+     <blockquote>There are still four references to test classes that extend from junit.framework.TestCase
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1485">YARN-1485</a>.
+     Major sub-task reported by Xuan Gong and fixed by Xuan Gong <br>
+     <b>Enabling HA should verify the RM service addresses configurations have been set for every RM Ids defined in RM_HA_IDs</b><br>
+     <blockquote>After YARN-1325, the YarnConfiguration.RM_HA_IDS will contain multiple RM_Ids. We need to verify that the RM service addresses configurations have been set for all of RM_Ids.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1482">YARN-1482</a>.
+     Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Xuan Gong <br>
+     <b>WebApplicationProxy should be always-on w.r.t HA even if it is embedded in the RM</b><br>
+     <blockquote>This way, even if an RM goes to standby mode, we can affect a redirect to the active. And more importantly, users will not suddenly see all their links stop working.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1481">YARN-1481</a>.
+     Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
+     <b>Move internal services logic from AdminService to ResourceManager</b><br>
+     <blockquote>This is something I found while reviewing YARN-1318, but didn't halt that patch as many cycles went there already. Some top level issues
+ - Not easy to follow RM's service life cycle
+    -- RM adds only AdminService as its service directly.
+    -- Other services are added to RM when AdminService's init calls RM.activeServices.init()
+ - Overall, AdminService shouldn't encompass all of RM's HA state management. It was originally supposed to be the implementation of just the RPC server.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1463">YARN-1463</a>.
+     Major test reported by Ted Yu and fixed by Vinod Kumar Vavilapalli <br>
+     <b>Tests should avoid starting http-server where possible or creates spnego keytab/principals</b><br>
+     <blockquote>Here is stack trace:
+testContainerManager[1](org.apache.hadoop.yarn.server.TestContainerManagerSecurity)  Time elapsed: 1.756 sec  &lt;&lt;&lt; ERROR!
+org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED
+  at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:253)
+  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
+  at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
+  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
+  at org.apache.hadoop.yarn.server.TestContainerManagerSecurity.testContainerManager(TestContainerManagerSecurity.java:110)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1454">YARN-1454</a>.
+     Critical bug reported by Jian He and fixed by Karthik Kambatla <br>
+     <b>TestRMRestart.testRMDelegationTokenRestoredOnRMRestart is failing intermittently </b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1451">YARN-1451</a>.
+     Minor bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
+     <b>TestResourceManager relies on the scheduler assigning multiple containers in a single node update</b><br>
+     <blockquote>TestResourceManager rely on the capacity scheduler.
+It relies on a scheduler that assigns multiple containers in a single heartbeat, which not all schedulers do by default.  It also relies on schedulers that don't consider CPU capacities.  It would be simple to change the test to use multiple heartbeats and increase the vcore capacities of the nodes in the test.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1450">YARN-1450</a>.
+     Major bug reported by Akira AJISAKA and fixed by Binglin Chang (applications/distributed-shell)<br>
+     <b>TestUnmanagedAMLauncher#testDSShell fails on trunk</b><br>
+     <blockquote>TestUnmanagedAMLauncher fails on trunk. The console output is
+Running org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
+Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 35.937 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
+testDSShell(org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher)  Time elapsed: 14.558 sec  &lt;&lt;&lt; ERROR!
+java.lang.RuntimeException: Failed to receive final expected state in ApplicationReport, CurrentState=ACCEPTED, ExpectedStates=FINISHED,FAILED,KILLED
+	at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.monitorApplication(UnmanagedAMLauncher.java:447)
+	at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:352)
+	at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:145)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1448">YARN-1448</a>.
+     Major sub-task reported by Wangda Tan and fixed by Wangda Tan (api , resourcemanager)<br>
+     <b>AM-RM protocol changes to support container resizing</b><br>
+     <blockquote>As described in YARN-1197, we need add API in RM to support
+1) Add increase request in AllocateRequest
+2) Can get successfully increased/decreased containers from RM in AllocateResponse</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1447">YARN-1447</a>.
+     Major sub-task reported by Wangda Tan and fixed by Wangda Tan (api)<br>
+     <b>Common PB type definitions for container resizing</b><br>
+     <blockquote>As described in YARN-1197, we need add some common PB types for container resource change, like ResourceChangeContext, etc. These types will be both used by RM/NM protocols</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1446">YARN-1446</a>.
+     Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
+     <b>Change killing application to wait until state store is done</b><br>
+     <blockquote>When user kills an application, it should wait until the state store is done with saving the killed status of the application. Otherwise, if RM crashes in the middle between user killing the application and writing the status to the store, RM will relaunch this application after it restarts.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1435">YARN-1435</a>.
+     Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
+     <b>Distributed Shell should not run other commands except "sh", and run the custom script at the same time.</b><br>
+     <blockquote>Currently, if we want to run custom script at DS. We can do it like this :
+--shell_command sh --shell_script custom_script.sh
+But it may be better to separate running shell_command and shell_script</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1425">YARN-1425</a>.
+     Major bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
+     <b>TestRMRestart fails because MockRM.waitForState(AttemptId) uses current attempt instead of the attempt passed as argument</b><br>
+     <blockquote>TestRMRestart is failing on trunk. Fixing it. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1423">YARN-1423</a>.
+     Major improvement reported by Sandy Ryza and fixed by Ted Malaska (scheduler)<br>
+     <b>Support queue placement by secondary group in the Fair Scheduler</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1419">YARN-1419</a>.
+     Minor bug reported by Jonathan Eagles and fixed by Jonathan Eagles (scheduler)<br>
+     <b>TestFifoScheduler.testAppAttemptMetrics fails intermittently under jdk7 </b><br>
+     <blockquote>QueueMetrics holds its data in a static variable causing metrics to bleed over from test to test. clearQueueMetrics is to be called for tests that need to measure metrics correctly for a single test. jdk7 comes into play since tests are run out of order, and in the case make the metrics unreliable.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1416">YARN-1416</a>.
+     Major bug reported by Omkar Vinit Joshi and fixed by Jian He <br>
+     <b>InvalidStateTransitions getting reported in multiple test cases even though they pass</b><br>
+     <blockquote>It might be worth checking why they are reporting this.
+Testcase : TestRMAppTransitions, TestRM
+there are large number of such errors.
+can't handle RMAppEventType.APP_UPDATE_SAVED at RMAppState.FAILED
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1411">YARN-1411</a>.
+     Critical sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
+     <b>HA config shouldn't affect NodeManager RPC addresses</b><br>
+     <blockquote>When HA is turned on, {{YarnConfiguration#getSoketAddress()}} fetches rpc-addresses corresponding to the specified rm-id. This should only be for RM rpc-addresses. Other confs, like NM rpc-addresses shouldn't be affected by this.
+Currently, the NM address settings in yarn-site.xml aren't reflected in the actual ports.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1409">YARN-1409</a>.
+     Major bug reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA <br>
+     <b>NonAggregatingLogHandler can throw RejectedExecutionException</b><br>
+     <blockquote>This problem is caused by handling APPLICATION_FINISHED events after calling sched.shotdown() in NonAggregatingLongHandler#serviceStop(). org.apache.hadoop.mapred.TestJobCleanup can fail because of RejectedExecutionException by NonAggregatingLogHandler.
+2013-11-13 10:53:06,970 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(166)) - Error in dispatcher thread
+java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@d51df63 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@7a20e369[Shutting down, pool size = 4, active threads = 0, queued tasks = 7, completed tasks = 0]
+        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
+        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
+        at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
+        at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
+        at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:121)
+        at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.handle(NonAggregatingLogHandler.java:49)
+        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:159)
+        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:95)
+        at java.lang.Thread.run(Thread.java:724)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1407">YARN-1407</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
+     <b>RM Web UI and REST APIs should uniformly use YarnApplicationState</b><br>
+     <blockquote>RMAppState isn't a public facing enum like YarnApplicationState, so we shouldn't return values or list filters that come from it. However, some Blocks and AppInfo are still using RMAppState.
+It is not 100% clear to me whether or not fixing this would be a backwards-incompatible change.  The change would only reduce the set of possible strings that the API returns, so I think not.  We have also been changing the contents of RMAppState since 2.2.0, e.g. in YARN-891. It would still be good to fix this ASAP (i.e. for 2.2.1).</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1405">YARN-1405</a>.
+     Major sub-task reported by Yesha Vora and fixed by Jian He <br>
+     <b>RM hangs on shutdown if calling system.exit in serviceInit or serviceStart</b><br>
+     <blockquote>Enable yarn.resourcemanager.recovery.enabled=true and Pass a local path to yarn.resourcemanager.fs.state-store.uri. such as "file:///tmp/MYTMP"
+if the directory  /tmp/MYTMP is not readable or writable, RM should crash and should print "Permission denied Error"
+Currently, RM throws "java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist" Error. RM returns Exiting status 1 but RM process does not shutdown. 
+Snapshot of Resource manager log:
+2013-09-27 18:31:36,621 INFO  security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:rollMasterKey(97)) - Rolling master-key for nm-tokens
+2013-09-27 18:31:36,694 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(640)) - Failed to load/recover state
+java.io.FileNotFoundException: File file:/tmp/MYTMP/FSRMStateRoot/RMDTSecretManagerRoot does not exist
+        at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:379)
+        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1478)
+        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1518)
+        at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:564)
+        at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadRMDTSecretManagerState(FileSystemRMStateStore.java:188)
+        at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.loadState(FileSystemRMStateStore.java:112)
+        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:635)
+        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
+        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:855)
+2013-09-27 18:31:36,697 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1403">YARN-1403</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza <br>
+     <b>Separate out configuration loading from QueueManager in the Fair Scheduler</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1401">YARN-1401</a>.
+     Major bug reported by Gera Shegalov and fixed by Gera Shegalov (nodemanager)<br>
+     <b>With zero sleep-delay-before-sigkill.ms, no signal is ever sent</b><br>
+     <blockquote>If you set in yarn-site.xml yarn.nodemanager.sleep-delay-before-sigkill.ms=0 then an unresponsive child JVM is never killed. In MRv1, TT used to immediately SIGKILL in this case. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1400">YARN-1400</a>.
+     Trivial bug reported by Raja Aluri and fixed by Raja Aluri (resourcemanager)<br>
+     <blockquote>yarn.cmd uses HADOOP_RESOURCEMANAGER_OPTS. Should be YARN_RESOURCEMANAGER_OPTS.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1395">YARN-1395</a>.
+     Major bug reported by Chris Nauroth and fixed by Chris Nauroth (applications/distributed-shell)<br>
+     <b>Distributed shell application master launched with debug flag can hang waiting for external ls process.</b><br>
+     <blockquote>Distributed shell launched with the debug flag will run {{ApplicationMaster#dumpOutDebugInfo}}.  This method launches an external process to run ls and print the contents of the current working directory.  We've seen that this can cause the application master to hang on {{Process#waitFor}}.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1392">YARN-1392</a>.
+     Major new feature reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Allow sophisticated app-to-queue placement policies in the Fair Scheduler</b><br>
+     <blockquote>Currently the Fair Scheduler supports app-to-queue placement by username.  It would be beneficial to allow more sophisticated policies that rely on primary and secondary groups and fallbacks.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1388">YARN-1388</a>.
+     Trivial bug reported by Liyin Liang and fixed by Liyin Liang (resourcemanager)<br>
+     <b>Fair Scheduler page always displays blank fair share</b><br>
+     <blockquote>YARN-1044 fixed min/max/used resource display problem in the scheduler  page. But the "Fair Share" has the same problem and need to fix it.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1387">YARN-1387</a>.
+     Major improvement reported by Karthik Kambatla and fixed by Karthik Kambatla (api)<br>
+     <b>RMWebServices should use ClientRMService for filtering applications</b><br>
+     <blockquote>YARN's REST API allows filtering applications, this should be moved to ClientRMService to allow Java API also support the same functionality.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1386">YARN-1386</a>.
+     Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
+     <b>NodeManager mistakenly loses resources and relocalizes them</b><br>
+     <blockquote>When a local resource that should already be present is requested again, the nodemanager checks to see if it still present.  However the method it uses to check for presence is via File.exists() as the user of the nodemanager process. If the resource was a private resource localized for another user, it will be localized to a location that is not accessible by the nodemanager user.  Therefore File.exists() returns false, the nodemanager mistakenly believes the resource is no longer available, and it proceeds to localize it over and over.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1381">YARN-1381</a>.
+     Minor bug reported by Ted Yu and fixed by Ted Yu <br>
+     <b>Same relaxLocality appears twice in exception message of AMRMClientImpl#checkLocalityRelaxationConflict() </b><br>
+     <blockquote>Here is related code:
+            throw new InvalidContainerRequestException("Cannot submit a "
+                + "ContainerRequest asking for location " + location
+                + " with locality relaxation " + relaxLocality + " when it has "
+                + "already been requested with locality relaxation " + relaxLocality);
+The last relaxLocality should be  reqs.values().iterator().next().remoteRequest.getRelaxLocality() </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1378">YARN-1378</a>.
+     Major sub-task reported by Jian He and fixed by Jian He (resourcemanager)<br>
+     <b>Implement a RMStateStore cleaner for deleting application/attempt info</b><br>
+     <blockquote>Now that we are storing the final state of application/attempt instead of removing application/attempt info on application/attempt completion(YARN-891), we need a separate RMStateStore cleaner for cleaning the application/attempt state.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1374">YARN-1374</a>.
+     Blocker bug reported by Devaraj K and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Resource Manager fails to start due to ConcurrentModificationException</b><br>
+     <blockquote>Resource Manager is failing to start with the below ConcurrentModificationException.
+2013-10-30 20:22:42,371 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list
+2013-10-30 20:22:42,376 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.util.ConcurrentModificationException
+	at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
+	at java.util.AbstractList$Itr.next(AbstractList.java:343)
+	at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
+	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
+	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
+	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
+	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
+2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioning to standby
+2013-10-30 20:22:42,378 INFO org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService: Transitioned to standby
+2013-10-30 20:22:42,378 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
+	at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
+	at java.util.AbstractList$Itr.next(AbstractList.java:343)
+	at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)
+	at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
+	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
+	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
+	at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:944)
+2013-10-30 20:22:42,379 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
+SHUTDOWN_MSG: Shutting down ResourceManager at HOST-10-18-40-24/
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1358">YARN-1358</a>.
+     Minor test reported by Chuan Liu and fixed by Chuan Liu (client)<br>
+     <b>TestYarnCLI fails on Windows due to line endings</b><br>
+     <blockquote>The unit test fails on Windows due to incorrect line endings was used for comparing the output from command line output. Error messages are as follows.
+junit.framework.ComparisonFailure: expected:&lt;...argument for options[]
+usage: application
+...&gt; but was:&lt;...argument for options[
+usage: application
+	at junit.framework.Assert.assertEquals(Assert.java:85)
+	at junit.framework.Assert.assertEquals(Assert.java:91)
+	at org.apache.hadoop.yarn.client.cli.TestYarnCLI.testMissingArguments(TestYarnCLI.java:878)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1357">YARN-1357</a>.
+     Minor test reported by Chuan Liu and fixed by Chuan Liu (nodemanager)<br>
+     <b>TestContainerLaunch.testContainerEnvVariables fails on Windows</b><br>
+     <blockquote>This test fails on Windows due to incorrect use of batch script command. Error messages are as follows.
+junit.framework.AssertionFailedError: expected:&lt;java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]&gt; but was:&lt;java.nio.HeapByteBuffer[pos=0 lim=19 cap=19]&gt;
+	at junit.framework.Assert.fail(Assert.java:50)
+	at junit.framework.Assert.failNotEquals(Assert.java:287)
+	at junit.framework.Assert.assertEquals(Assert.java:67)
+	at junit.framework.Assert.assertEquals(Assert.java:74)
+	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:508)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1351">YARN-1351</a>.
+     Trivial bug reported by Konstantin Weitz and fixed by Konstantin Weitz (resourcemanager)<br>
+     <b>Invalid string format in Fair Scheduler log warn message</b><br>
+     <blockquote>While trying to print a warning, two values of the wrong type (Resource instead of int) are passed into a String.format method call, leading to a runtime exception, in the file:
+The warning was intended to be printed whenever the resources don't fit into each other, either because the number of virtual cores or the memory is too small. I changed the %d's into %s, this way the warning will contain both the cores and the memory.
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1349">YARN-1349</a>.
+     Major bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
+     <b>yarn.cmd does not support passthrough to any arbitrary class.</b><br>
+     <blockquote>The yarn shell script supports passthrough to calling any arbitrary class if the first argument is not one of the per-defined sub-commands.  The equivalent cmd script does not implement this and instead fails trying to do a labeled goto to the first argument.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1343">YARN-1343</a>.
+     Critical bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (resourcemanager)<br>
+     <b>NodeManagers additions/restarts are not reported as node updates in AllocateResponse responses to AMs</b><br>
+     <blockquote>If a NodeManager joins the cluster or gets restarted, running AMs never receive the node update indicating the Node is running.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1335">YARN-1335</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Move duplicate code from FSSchedulerApp and FiCaSchedulerApp into SchedulerApplication</b><br>
+     <blockquote>FSSchedulerApp and FiCaSchedulerApp use duplicate code in a lot of places.  They both extend SchedulerApplication.  We can move a lot of this duplicate code into SchedulerApplication.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1333">YARN-1333</a>.
+     Major improvement reported by Sandy Ryza and fixed by Tsuyoshi OZAWA (scheduler)<br>
+     <b>Support blacklisting in the Fair Scheduler</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1332">YARN-1332</a>.
+     Minor improvement reported by Sandy Ryza and fixed by Sebastian Wong <br>
+     <b>In TestAMRMClient, replace assertTrue with assertEquals where possible</b><br>
+     <blockquote>TestAMRMClient uses a lot of "assertTrue(amClient.ask.size() == 0)" where "assertEquals(0, amClient.ask.size())" would make it easier to see why it's failing at a glance.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1331">YARN-1331</a>.
+     Trivial bug reported by Chris Nauroth and fixed by Chris Nauroth (client)<br>
+     <b>yarn.cmd exits with NoClassDefFoundError trying to run rmadmin or logs</b><br>
+     <blockquote>The yarn shell script was updated so that the rmadmin and logs sub-commands launch {{org.apache.hadoop.yarn.client.cli.RMAdminCLI}} and {{org.apache.hadoop.yarn.client.cli.LogsCLI}}.  The yarn.cmd script also needs to be updated so that the commands work on Windows.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1325">YARN-1325</a>.
+     Major sub-task reported by Tsuyoshi OZAWA and fixed by Xuan Gong (resourcemanager)<br>
+     <b>Enabling HA should check Configuration contains multiple RMs</b><br>
+     <blockquote>Currently, we can enable RM HA configuration without multiple RM ids(YarnConfiguration.RM_HA_IDS).  This behaviour can cause wrong operations. ResourceManager should verify that more than 1 RM id must be specified in RM-HA-IDs.
+One idea is to support "strict mode" to enforce this check as configuration(e.g. yarn.resourcemanager.ha.strict-mode.enabled).</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1323">YARN-1323</a>.
+     Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
+     <b>Set HTTPS webapp address along with other RPC addresses in HAUtil</b><br>
+     <blockquote>YARN-1232 adds the ability to configure multiple RMs, but missed out the https web app address. Need to add that in.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1321">YARN-1321</a>.
+     Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (client)<br>
+     <b>NMTokenCache is a singleton, prevents multiple AMs running in a single JVM to work correctly</b><br>
+     <blockquote>NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens.
+The error observed in the client side is something like:
+ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. 
+NMToken for application attempt : appattempt_1382038445650_0002_000001 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_000001
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1320">YARN-1320</a>.
+     Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
+     <b>Custom log4j properties in Distributed shell does not work properly.</b><br>
+     <blockquote>Distributed shell cannot pick up custom log4j properties (specified with -log_properties). It always uses default log4j properties.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1318">YARN-1318</a>.
+     Blocker sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Promote AdminService to an Always-On service and merge in RMHAProtocolService</b><br>
+     <blockquote>Per discussion in YARN-1068, we want AdminService to handle HA-admin operations in addition to the regular non-HA admin operations. To facilitate this, we need to move AdminService an Always-On service. </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1315">YARN-1315</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
+     <b>TestQueueACLs should also test FairScheduler</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1314">YARN-1314</a>.
+     Major bug reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
+     <b>Cannot pass more than 1 argument to shell command</b><br>
+     <blockquote>Distributed shell cannot accept more than 1 parameters in argument parts.
+All of these commands are treated as 1 parameter:
+/usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar &lt;distrubuted shell jar&gt; -shell_command echo -shell_args "'"My   name"                "is  Teddy"'"
+/usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar &lt;distrubuted shell jar&gt; -shell_command echo -shell_args "''My   name'                'is  Teddy''"
+/usr/bin/yarn  org.apache.hadoop.yarn.applications.distributedshell.Client -jar &lt;distrubuted shell jar&gt; -shell_command echo -shell_args "'My   name'                'is  Teddy'"</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1311">YARN-1311</a>.
+     Trivial sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
+     <b>Fix app specific scheduler-events' names to be app-attempt based</b><br>
+     <blockquote>Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are misnomers as schedulers only deal with AppAttempts today. This JIRA is for fixing their names so that we can add App-level events in the near future, notably for work-preserving RM-restart.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1307">YARN-1307</a>.
+     Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
+     <b>Rethink znode structure for RM HA</b><br>
+     <blockquote>Rethink for znode structure for RM HA is proposed in some JIRAs(YARN-659, YARN-1222). The motivation of this JIRA is quoted from Bikas' comment in YARN-1222:
+We should move to creating a node hierarchy for apps such that all znodes for an app are stored under an app znode instead of the app root znode. This will help in removeApplication and also in scaling better on ZK. The earlier code was written this way to ensure create/delete happens under a root znode for fencing. But given that we have moved to multi-operations globally, this isnt required anymore.
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1306">YARN-1306</a>.
+     Major bug reported by Wei Yan and fixed by Wei Yan <br>
+     <b>Clean up hadoop-sls sample-conf according to YARN-1228</b><br>
+     <blockquote>Move fair scheduler allocations configuration to fair-scheduler.xml, and move all scheduler stuffs to yarn-site.xml</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1305">YARN-1305</a>.
+     Major sub-task reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
+     <b>RMHAProtocolService#serviceInit should handle HAUtil's IllegalArgumentException</b><br>
+     <blockquote>When yarn.resourcemanager.ha.enabled is true, RMHAProtocolService#serviceInit calls HAUtil.setAllRpcAddresses. If the configuration values are null, it just throws IllegalArgumentException.
+It's messy to analyse which keys are null, so we should handle it and log the name of keys which are null.
+A current log dump is as follows:
+2013-10-15 06:24:53,431 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
+2013-10-15 06:24:54,203 INFO org.apache.hadoop.service.AbstractService: Service RMHAProtocolService failed in state INITED; cause: java.lang.IllegalArgumentException: Property value must not be null
+java.lang.IllegalArgumentException: Property value must not be null
+        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
+        at org.apache.hadoop.conf.Configuration.set(Configuration.java:816)
+        at org.apache.hadoop.conf.Configuration.set(Configuration.java:798)
+        at org.apache.hadoop.yarn.conf.HAUtil.setConfValue(HAUtil.java:100)
+        at org.apache.hadoop.yarn.conf.HAUtil.setAllRpcAddresses(HAUtil.java:105)
+        at org.apache.hadoop.yarn.server.resourcemanager.RMHAProtocolService.serviceInit(RMHAProtocolService.java:60)
+        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
+        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
+        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:187)
+        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
+        at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:940)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1303">YARN-1303</a>.
+     Major improvement reported by Tassapol Athiapinya and fixed by Xuan Gong (applications/distributed-shell)<br>
+     <b>Allow multiple commands separating with ";" in distributed-shell</b><br>
+     <blockquote>In shell, we can do "ls; ls" to run 2 commands at once. 
+In distributed shell, this is not working. We should improve to allow this to occur. There are practical use cases that I know of to run multiple commands or to set environment variables before a command.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1300">YARN-1300</a>.
+     Major bug reported by Ted Yu and fixed by Ted Yu <br>
+     <b>SLS tests fail because conf puts yarn properties in fair-scheduler.xml</b><br>
+     <blockquote>I was looking at https://builds.apache.org/job/PreCommit-YARN-Build/2165//testReport/org.apache.hadoop.yarn.sls/TestSLSRunner/testSimulatorRunning/
+I am able to reproduce the failure locally.
+I found that FairSchedulerConfiguration.getAllocationFile() doesn't read the yarn.scheduler.fair.allocation.file config entry from fair-scheduler.xml
+This leads to the following:
+Caused by: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Bad fair scheduler config file: top-level element not &lt;allocations&gt;
+	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.reloadAllocs(QueueManager.java:302)
+	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.initialize(QueueManager.java:108)
+	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.reinitialize(FairScheduler.java:1145)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1295">YARN-1295</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza (nodemanager)<br>
+     <b>In UnixLocalWrapperScriptBuilder, using bash -c can cause "Text file busy" errors</b><br>
+     <blockquote>I missed this when working on YARN-1271.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1293">YARN-1293</a>.
+     Major bug reported by Tsuyoshi OZAWA and fixed by Tsuyoshi OZAWA <br>
+     <b>TestContainerLaunch.testInvalidEnvSyntaxDiagnostics fails on trunk</b><br>
+     <blockquote>{quote}
+Test set: org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
+Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.655 sec &lt;&lt;&lt; FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
+testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)  Time elapsed: 0.114 sec  &lt;&lt;&lt; FAILURE!
+junit.framework.AssertionFailedError: null
+        at junit.framework.Assert.fail(Assert.java:48)
+        at junit.framework.Assert.assertTrue(Assert.java:20)
+        at junit.framework.Assert.assertTrue(Assert.java:27)
+        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:273)
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1290">YARN-1290</a>.
+     Major improvement reported by Wei Yan and fixed by Wei Yan <br>
+     <b>Let continuous scheduling achieve more balanced task assignment</b><br>
+     <blockquote>Currently, in continuous scheduling (YARN-1010), in each round, the thread iterates over pre-ordered nodes and assigns tasks. This mechanism may overload the first several nodes, while the latter nodes have no tasks.
+We should sort all nodes according to available resource. In each round, always assign tasks to nodes with larger capacity, which can balance the load distribution among all nodes.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1288">YARN-1288</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Make Fair Scheduler ACLs more user friendly</b><br>
+     <blockquote>The Fair Scheduler currently defaults the root queue's acl to empty and all other queues' acl to "*".  Now that YARN-1258 enables configuring the root queue, we should reverse this.  This will also bring the Fair Scheduler in line with the Capacity Scheduler.
+We should also not trim the acl strings, which makes it impossible to only specify groups in an acl.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1284">YARN-1284</a>.
+     Blocker bug reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur (nodemanager)<br>
+     <b>LCE: Race condition leaves dangling cgroups entries for killed containers</b><br>
+     <blockquote>When LCE &amp; cgroups are enabled, when a container is is killed (in this case by its owning AM, an MRAM) it seems to be a race condition at OS level when doing a SIGTERM/SIGKILL and when the OS does all necessary cleanup. 
+LCE code, after sending the SIGTERM/SIGKILL and getting the exitcode, immediately attempts to clean up the cgroups entry for the container. But this is failing with an error like:
+2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_1381179532433_0016_01_000011 is : 143
+2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Processing container_1381179532433_0016_01_000011 of type UPDATE_DIAGNOSTICS_MSG
+2013-10-07 15:21:24,359 DEBUG org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: deleteCgroup: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_000011
+2013-10-07 15:21:24,359 WARN org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler: Unable to delete cgroup at: /run/cgroups/cpu/hadoop-yarn/container_1381179532433_0016_01_000011
+CgroupsLCEResourcesHandler.clearLimits() has logic to wait for 500 ms for AM containers to avoid this problem. it seems this should be done for all containers.
+Still, waiting for extra 500ms seems too expensive.
+We should look at a way of doing this in a more 'efficient way' from time perspective, may be spinning while the deleteCgroup() cannot be done with a minimal sleep and a timeout.
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1283">YARN-1283</a>.
+     Major sub-task reported by Yesha Vora and fixed by Omkar Vinit Joshi <br>
+     <b>Invalid 'url of job' mentioned in Job output with yarn.http.policy=HTTPS_ONLY</b><br>
+     <blockquote>After setting yarn.http.policy=HTTPS_ONLY, the job output shows incorrect "The url to track the job".
+Currently, its printing http://RM:&lt;httpsport&gt;/proxy/application_1381162886563_0001/ instead https://RM:&lt;httpsport&gt;/proxy/application_1381162886563_0001/
+http://hostname:8088/proxy/application_1381162886563_0001/ is invalid
+hadoop  jar hadoop-mapreduce-client-jobclient-tests.jar sleep -m 1 -r 1 
+13/10/07 18:39:39 INFO client.RMProxy: Connecting to ResourceManager at hostname/
+13/10/07 18:39:40 INFO mapreduce.JobSubmitter: number of splits:1
+13/10/07 18:39:40 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
+13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.partitioner.class is deprecated. Instead, use mapreduce.job.partitioner.class
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
+13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
+13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
+13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
+13/10/07 18:39:40 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
+13/10/07 18:39:40 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
+13/10/07 18:39:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1381162886563_0001
+13/10/07 18:39:40 INFO impl.YarnClientImpl: Submitted application application_1381162886563_0001 to ResourceManager at hostname/
+13/10/07 18:39:40 INFO mapreduce.Job: The url to track the job: http://hostname:8088/proxy/application_1381162886563_0001/
+13/10/07 18:39:40 INFO mapreduce.Job: Running job: job_1381162886563_0001
+13/10/07 18:39:46 INFO mapreduce.Job: Job job_1381162886563_0001 running in uber mode : false
+13/10/07 18:39:46 INFO mapreduce.Job:  map 0% reduce 0%
+13/10/07 18:39:53 INFO mapreduce.Job:  map 100% reduce 0%
+13/10/07 18:39:58 INFO mapreduce.Job:  map 100% reduce 100%
+13/10/07 18:39:58 INFO mapreduce.Job: Job job_1381162886563_0001 completed successfully
+13/10/07 18:39:58 INFO mapreduce.Job: Counters: 43
+	File System Counters
+		FILE: Number of bytes read=26
+		FILE: Number of bytes written=177279
+		FILE: Number of read operations=0
+		FILE: Number of large read operations=0
+		FILE: Number of write operations=0
+		HDFS: Number of bytes read=48
+		HDFS: Number of bytes written=0
+		HDFS: Number of read operations=1
+		HDFS: Number of large read operations=0
+		HDFS: Number of write operations=0
+	Job Counters 
+		Launched map tasks=1
+		Launched reduce tasks=1
+		Other local map tasks=1
+		Total time spent by all maps in occupied slots (ms)=7136
+		Total time spent by all reduces in occupied slots (ms)=6062
+	Map-Reduce Framework
+		Map input records=1
+		Map output records=1
+		Map output bytes=4
+		Map output materialized bytes=22
+		Input split bytes=48
+		Combine input records=0
+		Combine output records=0
+		Reduce input groups=1
+		Reduce shuffle bytes=22
+		Reduce input records=1
+		Reduce output records=0
+		Spilled Records=2
+		Shuffled Maps =1
+		Failed Shuffles=0
+		Merged Map outputs=1
+		GC time elapsed (ms)=60
+		CPU time spent (ms)=1700
+		Physical memory (bytes) snapshot=567582720
+		Virtual memory (bytes) snapshot=4292997120
+		Total committed heap usage (bytes)=846594048
+	Shuffle Errors
+		BAD_ID=0
+	File Input Format Counters 
+		Bytes Read=0
+	File Output Format Counters 
+		Bytes Written=0
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1268">YARN-1268</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>TestFairScheduler.testContinuousScheduling is flaky</b><br>
+     <blockquote>It looks like there's a timeout in it that's causing it to be flaky.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1265">YARN-1265</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza (resourcemanager , scheduler)<br>
+     <b>Fair Scheduler chokes on unhealthy node reconnect</b><br>
+     <blockquote>Only nodes in the RUNNING state are tracked by schedulers.  When a node reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's in the RUNNING state.  The FairScheduler doesn't guard against this.
+I think the best way to fix this is to check to see whether a node is RUNNING before telling the scheduler to remove it.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1259">YARN-1259</a>.
+     Trivial bug reported by Sandy Ryza and fixed by Robert Kanter (scheduler)<br>
+     <b>In Fair Scheduler web UI, queue num pending and num active apps switched</b><br>
+     <blockquote>The values returned in FairSchedulerLeafQueueInfo by numPendingApplications and numActiveApplications should be switched.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1258">YARN-1258</a>.
+     Major improvement reported by Sandy Ryza and fixed by Sandy Ryza (scheduler)<br>
+     <b>Allow configuring the Fair Scheduler root queue</b><br>
+     <blockquote>This would be useful for acls, maxRunningApps, scheduling modes, etc.
+The allocation file should be able to accept both:
+* An implicit root queue
+* A root queue at the top of the hierarchy with all queues under/inside of it</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1253">YARN-1253</a>.
+     Blocker new feature reported by Alejandro Abdelnur and fixed by Roman Shaposhnik (nodemanager)<br>
+     <b>Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode</b><br>
+     <blockquote>When using cgroups we require LCE to be configured in the cluster to start containers. 
+When LCE starts containers as the user that submitted the job. While this works correctly in a secure setup, in an un-secure setup this presents a couple issues:
+* LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
+* Because users can impersonate other users, any user would have access to any local file of other users
+Particularly, the second issue is not desirable as a user could get access to ssh keys of other users in the nodes or if there are NFS mounts, get to other users data outside of the cluster.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1241">YARN-1241</a>.
+     Major bug reported by Sandy Ryza and fixed by Sandy Ryza <br>
+     <b>In Fair Scheduler, maxRunningApps does not work for non-leaf queues</b><br>
+     <blockquote>Setting the maxRunningApps property on a parent queue should make it that the sum of apps in all subqueues can't exceed it</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1239">YARN-1239</a>.
+     Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
+     <b>Save version information in the state store</b><br>
+     <blockquote>When creating root dir for the first time we should write version 1. If root dir exists then we should check that the version in the state store matches the version from config.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1232">YARN-1232</a>.
+     Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Configuration to support multiple RMs</b><br>
+     <blockquote>We should augment the configuration to allow users specify two RMs and the individual RPC addresses for them.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1222">YARN-1222</a>.
+     Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
+     <b>Make improvements in ZKRMStateStore for fencing</b><br>
+     <blockquote>Using multi-operations for every ZK interaction. 
+In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1210">YARN-1210</a>.
+     Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Omkar Vinit Joshi <br>
+     <b>During RM restart, RM should start a new attempt only when previous attempt exits for real</b><br>
+     <blockquote>When RM recovers, it can wait for existing AMs to contact RM back and then kill them forcefully before even starting a new AM. Worst case, RM will start a new AppAttempt after waiting for 10 mins ( the expiry interval). This way we'll minimize multiple AMs racing with each other. This can help issues with downstream components like Pig, Hive and Oozie during RM restart.
+In the mean while, new apps will proceed as usual as existing apps wait for recovery.
+This can continue to be useful after work-preserving restart, so that AMs which can properly sync back up with RM can continue to run and those that don't are guaranteed to be killed before starting a new attempt.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1199">YARN-1199</a>.
+     Major improvement reported by Mit Desai and fixed by Mit Desai <br>
+     <b>Make NM/RM Versions Available</b><br>
+     <blockquote>Now as we have the NM and RM Versions available, we can display the YARN version of nodes running in the cluster.
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1188">YARN-1188</a>.
+     Trivial bug reported by Akira AJISAKA and fixed by Tsuyoshi OZAWA <br>
+     <b>The context of QueueMetrics becomes 'default' when using FairScheduler</b><br>
+     <blockquote>I found the context of QueueMetrics changed to 'default' from 'yarn' when I was using FairScheduler.
+The context should always be 'yarn' by adding an annotation to FSQueueMetrics like below:
++ @Metrics(context="yarn")
+public class FSQueueMetrics extends QueueMetrics {
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1185">YARN-1185</a>.
+     Major sub-task reported by Jason Lowe and fixed by Omkar Vinit Joshi (resourcemanager)<br>
+     <b>FileSystemRMStateStore can leave partial files that prevent subsequent recovery</b><br>
+     <blockquote>FileSystemRMStateStore writes directly to the destination file when storing state. However if the RM were to crash in the middle of the write, the recovery method could encounter a partially-written file and either outright crash during recovery or silently load incomplete state.
+To avoid this, the data should be written to a temporary file and renamed to the destination file afterwards.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1183">YARN-1183</a>.
+     Major bug reported by Andrey Klochkov and fixed by Andrey Klochkov <br>
+     <b>MiniYARNCluster shutdown takes several minutes intermittently</b><br>
+     <blockquote>As described in MAPREDUCE-5501 sometimes M/R tests leave MRAppMaster java processes living for several minutes after successful completion of the corresponding test. There is a concurrency issue in MiniYARNCluster shutdown logic which leads to this. Sometimes RM stops before an app master sends it's last report, and then the app master keeps retrying for &gt;6 minutes. In some cases it leads to failures in subsequent tests, and it affects performance of tests as app masters eat resources.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1182">YARN-1182</a>.
+     Major bug reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
+     <b>MiniYARNCluster creates and inits the RM/NM only on start()</b><br>
+     <blockquote>MiniYARNCluster creates and inits the RM/NM only on start(). It should create and init() during init() itself.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1181">YARN-1181</a>.
+     Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla <br>
+     <b>Augment MiniYARNCluster to support HA mode</b><br>
+     <blockquote>MiniYARNHACluster, along the lines of MiniYARNCluster, is needed for end-to-end HA tests.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1180">YARN-1180</a>.
+     Trivial bug reported by Thomas Graves and fixed by Chen He (capacityscheduler)<br>
+     <b>Update capacity scheduler docs to include types on the configs</b><br>
+     <blockquote>The capacity scheduler docs (http://hadoop.apache.org/docs/r2.1.0-beta/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) don't include types for all the configs. For instance the minimum-user-limit-percent doesn't say its an Int.  It also the only setting for the Resource Allocation configs that is an Int rather then a float.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1176">YARN-1176</a>.
+     Critical bug reported by Thomas Graves and fixed by Jonathan Eagles (resourcemanager)<br>
+     <b>RM web services ClusterMetricsInfo total nodes doesn't include unhealthy nodes</b><br>
+     <blockquote>In the web services api for the cluster/metrics, the totalNodes reported doesn't include the unhealthy nodes.
+this.totalNodes = activeNodes + lostNodes + decommissionedNodes
+	        + rebootedNodes;</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1172">YARN-1172</a>.
+     Major sub-task reported by Karthik Kambatla and fixed by Tsuyoshi OZAWA (resourcemanager)<br>
+     <b>Convert *SecretManagers in the RM to services</b><br>
+     <blockquote></blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1145">YARN-1145</a>.
+     Major bug reported by Rohith and fixed by Rohith <br>
+     <b>Potential file handle leak in aggregated logs web ui</b><br>
+     <blockquote>Any problem in getting aggregated logs for rendering on web ui, then LogReader is not closed. 
+Now, it reader is not closed which causing many connections in close_wait state.
+hadoopuser@hadoopuser:&gt; jps
+*27909* JobHistoryServer
+DataNode port is 50010. When greped with DataNode port, many connections are in CLOSE_WAIT from JHS.
+hadoopuser@hadoopuser:&gt; netstat -tanlp |grep 50010
+tcp        0      0*               LISTEN      21453/java          
+tcp        1      0       CLOSE_WAIT  *27909*/java          
+tcp        1      0      CLOSE_WAIT  *27909*/java          
+tcp        1      0       CLOSE_WAIT  *27909*/java          
+tcp        1      0       CLOSE_WAIT  *27909*/java          
+tcp        1      0      CLOSE_WAIT  *27909*/java          </blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1138">YARN-1138</a>.
+     Major bug reported by Yingda Chen and fixed by Chuan Liu (api)<br>
+     <b>yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows</b><br>
+     <blockquote>yarn-default.xml has "yarn.application.classpath" entry set to $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib. It does not work on Windows which needs to be fixed.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1121">YARN-1121</a>.
+     Major sub-task reported by Bikas Saha and fixed by Jian He (resourcemanager)<br>
+     <b>RMStateStore should flush all pending store events before closing</b><br>
+     <blockquote>on serviceStop it should wait for all internal pending events to drain before stopping.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1119">YARN-1119</a>.
+     Major test reported by Robert Parker and fixed by Mit Desai (resourcemanager)<br>
+     <b>Add ClusterMetrics checks to tho TestRMNodeTransitions tests</b><br>
+     <blockquote>YARN-1101 identified an issue where UNHEALTHY nodes could double decrement the active nodes. We should add checks for RUNNING node transitions.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1109">YARN-1109</a>.
+     Major improvement reported by Sandy Ryza and fixed by haosdent (nodemanager)<br>
+     <b>Demote NodeManager "Sending out status for container" logs to debug</b><br>
+     <blockquote>Diagnosing NodeManager and container launch problems is made more difficult by the enormous number of logs like
+Sending out status for container: container_id {, app_attempt_id {, application_id {, id: 18, cluster_timestamp: 1377559361179, }, attemptId: 1, }, id: 1337, }, state: C_RUNNING, diagnostics: "Container killed by the ApplicationMaster.\n", exit_status: -1000
+On an NM with a few containers I am seeing tens of these per second.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1101">YARN-1101</a>.
+     Major bug reported by Robert Parker and fixed by Robert Parker (resourcemanager)<br>
+     <b>Active nodes can be decremented below 0</b><br>
+     <blockquote>The issue is in RMNodeImpl where both RUNNING and UNHEALTHY states that transition to a deactive state (LOST, DECOMMISSIONED, REBOOTED) use the same DeactivateNodeTransition class.  The DeactivateNodeTransition class naturally decrements the active node, however the in cases where the node has transition to UNHEALTHY the active count has already been decremented.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1098">YARN-1098</a>.
+     Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Separate out RM services into "Always On" and "Active"</b><br>
+     <blockquote>From discussion on YARN-1027, it makes sense to separate out services that are stateful and stateless. The stateless services can  run perennially irrespective of whether the RM is in Active/Standby state, while the stateful services need to  be started on transitionToActive() and completely shutdown on transitionToStandby().
+The external-facing stateless services should respond to the client/AM/NM requests depending on whether the RM is Active/Standby.
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1068">YARN-1068</a>.
+     Major sub-task reported by Karthik Kambatla and fixed by Karthik Kambatla (resourcemanager)<br>
+     <b>Add admin support for HA operations</b><br>
+     <blockquote>Support HA admin operations to facilitate transitioning the RM to Active and Standby states.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1060">YARN-1060</a>.
+     Major bug reported by Sandy Ryza and fixed by Niranjan Singh (scheduler)<br>
+     <b>Two tests in TestFairScheduler are missing @Test annotation</b><br>
+     <blockquote>Amazingly, these tests appear to pass with the annotations added.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1053">YARN-1053</a>.
+     Blocker bug reported by Omkar Vinit Joshi and fixed by Omkar Vinit Joshi <br>
+     <b>Diagnostic message from ContainerExitEvent is ignored in ContainerImpl</b><br>
+     <blockquote>If the container launch fails then we send ContainerExitEvent. This event contains exitCode and diagnostic message. Today we are ignoring diagnostic message while handling this event inside ContainerImpl. Fixing it as it is useful in diagnosing the failure.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1044">YARN-1044</a>.
+     Critical bug reported by Sangjin Lee and fixed by Sangjin Lee (resourcemanager , scheduler)<br>
+     <b>used/min/max resources do not display info in the scheduler page</b><br>
+     <blockquote>Go to the scheduler page in RM, and click any queue to display the detailed info. You'll find that none of the resources entries (used, min, or max) would display values.
+It is because the values contain brackets ("&lt;" and "&gt;") and are not properly html-escaped.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1033">YARN-1033</a>.
+     Major sub-task reported by Nemon Lou and fixed by Karthik Kambatla <br>
+     <b>Expose RM active/standby state to Web UI and REST API</b><br>
+     <blockquote>Both active and standby RM shall expose it's web server and show it's current state (active or standby) on web page. Users should be able to access this information through the REST API as well.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1029">YARN-1029</a>.
+     Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
+     <b>Allow embedding leader election into the RM</b><br>
+     <blockquote>It should be possible to embed common ActiveStandyElector into the RM such that ZooKeeper based leader election and notification is in-built. In conjunction with a ZK state store, this configuration will be a simple deployment option.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1028">YARN-1028</a>.
+     Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
+     <b>Add FailoverProxyProvider like capability to RMProxy</b><br>
+     <blockquote>RMProxy layer currently abstracts RM discovery and implements it by looking up service information from configuration. Motivated by HDFS and using existing classes from Common, we can add failover proxy providers that may provide RM discovery in extensible ways.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1027">YARN-1027</a>.
+     Major sub-task reported by Bikas Saha and fixed by Karthik Kambatla <br>
+     <b>Implement RMHAProtocolService</b><br>
+     <blockquote>Implement existing HAServiceProtocol from Hadoop common. This protocol is the single point of interaction between the RM and HA clients/services.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1022">YARN-1022</a>.
+     Trivial bug reported by Bikas Saha and fixed by haosdent <br>
+     <b>Unnecessary INFO logs in AMRMClientAsync</b><br>
+     <blockquote>Logs like the following should be debug or else every legitimate stop causes unnecessary exception traces in the logs.
+464 2013-08-03 20:01:34,459 INFO [AMRM Heartbeater thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl:            Heartbeater interrupted
+465 java.lang.InterruptedException: sleep interrupted
+466   at java.lang.Thread.sleep(Native Method)
+467   at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:249)
+468 2013-08-03 20:01:34,460 INFO [AMRM Callback Handler Thread] org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl:       Interrupted while waiting for queue
+469 java.lang.InterruptedException
+470   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.     java:1961)
+471   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1996)
+472   at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
+473   at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:275)</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1021">YARN-1021</a>.
+     Major new feature reported by Wei Yan and fixed by Wei Yan (scheduler)<br>
+     <b>Yarn Scheduler Load Simulator</b><br>
+     <blockquote>The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair  schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful.
+We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation.
+The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM.
+To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler.
+The simulator will produce real time metrics while executing, including:
+* Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity.
+* The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the  scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc).
+* Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits.
+The simulator will provide real time charts showing the behavior of the scheduler and its performance.
+A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler.</blockquote></li>
+<li> <a href="https://issues.apache.org/jira/browse/YARN-1010">YARN-1010</a>.
+     Critical improvement reported by Alejandro Abdelnur and fixed by Wei Yan (scheduler)<br>
+     <b>FairScheduler: decouple container scheduling from nodemanager heartbeats</b><br>
+     <blockquote>Currently scheduling for a node is done when a node heartbeats.
+For large cluster where the heartbeat interval is set to several seconds this delays scheduling of incoming allocations significantly.

[... 1970 lines stripped ...]

View raw message