hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1321) NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly.
Date Sat, 19 Oct 2013 03:21:43 GMT

    [ https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799747#comment-13799747
] 

Alejandro Abdelnur commented on YARN-1321:
------------------------------------------

We run into this issue in Llama. Llama is a single JVM hosting multiple unmanaged ApplicationMasters
that run at the same time (in parallel). Because NMTokenCache is a singleton NMTokens for
the same node from the different AMs step on each other.

The patch that I'm working preserves the current behavior (singleton NMTokenCache) while allowing
a client to set a NMTokenCache instance to the AMRMClient/NMClient (and Async versions). If
an instance is set, then the NMTokens are stored in it instead of the singleton. This preserves
backward compatibility both in behavior and in API.




> NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work
correctly.
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-1321
>                 URL: https://issues.apache.org/jira/browse/YARN-1321
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.2.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 2.2.1
>
>
> NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM
NMTokens for the same node from different AMs step on each other and starting containers fail
due to mismatch tokens.
> The error observed in the client side is something like:
> {code}
> ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama
(auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException:
Unauthorized request to start container. 
> NMToken for application attempt : appattempt_1382038445650_0002_000001 was used for starting
container with container token issued for application attempt : appattempt_1382038445650_0001_000001
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message