hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kumar Vavilapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1795) Oozie tests are flakey after YARN-713
Date Fri, 07 Mar 2014 02:44:43 GMT

    [ https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13923457#comment-13923457
] 

Vinod Kumar Vavilapalli commented on YARN-1795:
-----------------------------------------------

Per [~sseth], it is likely that you are confusing the ports because it is MiniYarnCluster
setup where you are running multiple NMs on the same machine? The bug seems valid, but may
be the analysis isn't. Not sure completely either ways. It'll be useful if you can capture
RM logs specifically for this container.

> Oozie tests are flakey after YARN-713
> -------------------------------------
>
>                 Key: YARN-1795
>                 URL: https://issues.apache.org/jira/browse/YARN-1795
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Robert Kanter
>            Priority: Critical
>
> Running the Oozie unit tests against a Hadoop build with YARN-713 causes many of the
tests to be flakey.  Doing some digging, I found that they were failing because some of the
MR jobs were failing; I found this in the syslog of the failed jobs:
> {noformat}
> 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1394064846476_0013_m_000000_0: Container launch failed for
container_1394064846476_0013_01_000003 : org.apache.hadoop.security.token.SecretManager$InvalidToken:
No NMToken sent for 192.168.1.77:50759
>        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
>        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:196)
>        at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
>        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
>        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
>        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>        at java.lang.Thread.run(Thread.java:744)
> {noformat}
> I did some debugging and found that the NMTokenCache has a different port number than
what's being looked up.  For example, the NMTokenCache had one token with address 192.168.1.77:58217
but ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. The 58213
address comes from ContainerLauncherImpl's constructor. So when the Container is being launched
it somehow has a different port than when the token was created.
> Any ideas why the port numbers wouldn't match?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message