Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2131B10639 for ; Fri, 7 Mar 2014 02:44:58 +0000 (UTC) Received: (qmail 31198 invoked by uid 500); 7 Mar 2014 02:44:47 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 31010 invoked by uid 500); 7 Mar 2014 02:44:44 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 30778 invoked by uid 99); 7 Mar 2014 02:44:43 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Mar 2014 02:44:43 +0000 Date: Fri, 7 Mar 2014 02:44:43 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1795) Oozie tests are flakey after YARN-713 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923457#comment-13923457 ] Vinod Kumar Vavilapalli commented on YARN-1795: ----------------------------------------------- Per [~sseth], it is likely that you are confusing the ports because it is MiniYarnCluster setup where you are running multiple NMs on the same machine? The bug seems valid, but may be the analysis isn't. Not sure completely either ways. It'll be useful if you can capture RM logs specifically for this container. > Oozie tests are flakey after YARN-713 > ------------------------------------- > > Key: YARN-1795 > URL: https://issues.apache.org/jira/browse/YARN-1795 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: Robert Kanter > Priority: Critical > > Running the Oozie unit tests against a Hadoop build with YARN-713 causes many of the tests to be flakey. Doing some digging, I found that they were failing because some of the MR jobs were failing; I found this in the syslog of the failed jobs: > {noformat} > 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1394064846476_0013_m_000000_0: Container launch failed for container_1394064846476_0013_01_000003 : org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent for 192.168.1.77:50759 > at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206) > at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.(ContainerManagementProtocolProxy.java:196) > at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117) > at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403) > at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138) > at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > {noformat} > I did some debugging and found that the NMTokenCache has a different port number than what's being looked up. For example, the NMTokenCache had one token with address 192.168.1.77:58217 but ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. The 58213 address comes from ContainerLauncherImpl's constructor. So when the Container is being launched it somehow has a different port than when the token was created. > Any ideas why the port numbers wouldn't match? -- This message was sent by Atlassian JIRA (v6.2#6252)