Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67EA5104BF for ; Sat, 19 Oct 2013 18:08:57 +0000 (UTC) Received: (qmail 96399 invoked by uid 500); 19 Oct 2013 18:08:57 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 95241 invoked by uid 500); 19 Oct 2013 18:08:51 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 94615 invoked by uid 99); 19 Oct 2013 18:08:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Oct 2013 18:08:47 +0000 Date: Sat, 19 Oct 2013 18:08:47 +0000 (UTC) From: "Vinod Kumar Vavilapalli (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1321) NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799963#comment-13799963 ] Vinod Kumar Vavilapalli commented on YARN-1321: ----------------------------------------------- bq. Llama is a single JVM hosting multiple unmanaged ApplicationMasters that run at the same time (in parallel). Because NMTokenCache is a singleton NMTokens for the same node from the different AMs step on each other. Okay, that explains the context. bq. So far this is the only issue we've run while using multiple AMs in a single JVM. That is good to know. You should add some kind of simple test so that so that this assumption isn't broken in the future. bq. This seems like that after this patch goes in, all applications will need to change to work correctly with the client libraries? Sigh, that is true. Changing from static to non-static breaks apps. We can do one of the two things: - Keep the statics around for single AM per JVM case - which I believe will cover 99% cases and add new non-static APIs or - Doing something that Omkar is suggesting - add optional APIs to track NMTokens per appattempt. Irrespective of the solution, I think we should skip the MR and dist-shell changes altogether - atleast to prove that the changes are compatible. We can may be fix them in a follow up ticket. > NMTokenCache is a a singleton, prevents multiple AMs running in a single JVM to work correctly > ---------------------------------------------------------------------------------------------- > > Key: YARN-1321 > URL: https://issues.apache.org/jira/browse/YARN-1321 > Project: Hadoop YARN > Issue Type: Bug > Components: client > Affects Versions: 2.2.0 > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Priority: Blocker > Fix For: 2.2.1 > > Attachments: YARN-1321.patch > > > NMTokenCache is a singleton. Because of this, if running multiple AMs in a single JVM NMTokens for the same node from different AMs step on each other and starting containers fail due to mismatch tokens. > The error observed in the client side is something like: > {code} > ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:llama (auth:PROXY) via llama (auth:SIMPLE) cause:org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container. > NMToken for application attempt : appattempt_1382038445650_0002_000001 was used for starting container with container token issued for application attempt : appattempt_1382038445650_0001_000001 > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)