hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6834) MR application fails with "No NMToken sent" exception after MRAppMaster recovery
Date Wed, 01 Mar 2017 14:54:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890310#comment-15890310
] 

Jason Lowe commented on MAPREDUCE-6834:
---------------------------------------

Yes, which is why I've been asking all along if this is a case where MapReduce has been modified
to preserve containers across AM attempts.  If that is indeed the case then this is essentially
a bug report against an internal patch that is not part of Apache Hadoop.

> MR application fails with "No NMToken sent" exception after MRAppMaster recovery
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6834
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6834
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: resourcemanager, yarn
>    Affects Versions: 2.7.0
>         Environment: Centos 7
>            Reporter: Aleksandr Balitsky
>            Assignee: Aleksandr Balitsky
>            Priority: Critical
>         Attachments: YARN-6019.001.patch
>
>
> *Steps to reproduce:*
> 1) Submit MR application (for example PI app with 50 containers)
> 2) Find MRAppMaster process id for the application 
> 3) Kill MRAppMaster by kill -9 command
> *Expected:* ResourceManager launch new MRAppMaster container and MRAppAttempt and application
finish correctly
> *Actually:* After launching new MRAppMaster and MRAppAttempt the application fails with
the following exception:
> {noformat}
> 2016-12-22 23:17:53,929 ERROR [ContainerLauncher #9] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl:
Container launch failed for container_1482408247195_0002_02_000011 : org.apache.hadoop.security.token.SecretManager$InvalidToken:
No NMToken sent for node1:43037
> 	at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:254)
> 	at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.<init>(ContainerManagementProtocolProxy.java:244)
> 	at org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:129)
> 	at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:395)
> 	at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
> 	at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:361)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}
> *Problem*:
> When RMCommunicator sends "registerApplicationMaster" request to RM, RM generates NMTokens
for new RMAppAttempt. Those new NMTokens are transmitted to RMCommunicator in RegisterApplicationMasterResponse
 (getNMTokensFromPreviousAttempts method). But we don't handle these tokens in RMCommunicator.register
method. RM don't transmit tese tokens again for other allocated requests, but we don't have
these tokens in NMTokenCache. Accordingly we get "No NMToken sent for node" exception.
> I have found that this issue appears after changes from the https://github.com/apache/hadoop/commit/9b272ccae78918e7d756d84920a9322187d61eed

> I tried to do the same scenario without the commit and application completed successfully
after RMAppMaster recovery



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message