Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 957BA200C4F for ; Sat, 18 Mar 2017 01:56:19 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 94050160B8C; Sat, 18 Mar 2017 00:56:19 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B83F6160B80 for ; Sat, 18 Mar 2017 01:56:18 +0100 (CET) Received: (qmail 74793 invoked by uid 500); 18 Mar 2017 00:56:17 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 74781 invoked by uid 99); 18 Mar 2017 00:56:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Mar 2017 00:56:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 09D11C0A27 for ; Sat, 18 Mar 2017 00:56:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.399 X-Spam-Level: ** X-Spam-Status: No, score=2.399 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 2JmRWE0tXaIC for ; Sat, 18 Mar 2017 00:56:14 +0000 (UTC) Received: from mail-wr0-f173.google.com (mail-wr0-f173.google.com [209.85.128.173]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B08C65F1BA for ; Sat, 18 Mar 2017 00:56:13 +0000 (UTC) Received: by mail-wr0-f173.google.com with SMTP id u48so61421865wrc.0 for ; Fri, 17 Mar 2017 17:56:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=CbUg+JK1/lKAKonbRCDYGyU35k2sWQsnoIB78Z1voyU=; b=SiX99lBKRjXMFoW6s8dT1J1yMfqEZ+l0mJg7V/ZhnHLdzMGneNLfE8x1hKIqECSmLK aTWcWLwpx0uoDo5TRh0CAZNGNPjgMPicEhFiCdaaoK5mQww6TmA6clgnUDTsvcmD0lha FJ2S8LupLXpFS5OPHgu3eQbI7D7hugNezT8nVL0m9xELZKtmKf4XNJ3fhtKxH8o0XySi 1SeQaVWatsn+72IF6kjY6Tn0ln3bZIWi2TJDeKr2ofsvSSNxDnhE5du0cal7QhaN5YNm WH+z8TEDnbCBn9lczwzjV3DxiDVUs6ucjxdCZfijzERz9tdO/nxpOHxITMpi0d80zGX0 Z4qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=CbUg+JK1/lKAKonbRCDYGyU35k2sWQsnoIB78Z1voyU=; b=MICyX0vvIqLQkDXxT8vacBpA3yJzv5Dhhfy/yNz6seg3WgWx34wdNvEbQHCaMut10T 7C1qHIVqq6e5p1V0q+GdWqosWDogAtIZGHjqIaSlosEyfG4s9DON3phLj8mXqDGGWvKS bs6Waa7NJHSIlfQ4yaxrtbsOYAJOWtOn0ALnm9U0ylWhhvqui1WKETuKqT3Sqf+xEYiS vFO3bIme1ArbTfxqJvic01nBij48xMq9tztOqtmnG6ilw5un73dKC8M2XmgkjGpEKRIL wNN0VdHUeTmlms6UVXWQq0dwwEzbA7AIEl88/MNs+NPfvwwqxbRB80FMZecWY+UBaWid IZeA== X-Gm-Message-State: AFeK/H0gy+/CiQgVnKWzrWVuBvWpsKPMUn7abFlsvLPLxZFRZ+k3BdqVnj09efb+uuc4EC4W4ROPCG0WzPjACg== X-Received: by 10.223.163.21 with SMTP id c21mr14920677wrb.115.1489798567649; Fri, 17 Mar 2017 17:56:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.187.75 with HTTP; Fri, 17 Mar 2017 17:55:37 -0700 (PDT) In-Reply-To: References: <2023140125.1703150.1489700071731.ref@mail.yahoo.com> <2023140125.1703150.1489700071731@mail.yahoo.com> From: Sergiy Matusevych Date: Fri, 17 Mar 2017 17:55:37 -0700 Message-ID: Subject: Re: Two AMs in one YARN container? To: dev@reef.apache.org, subru@apache.org Cc: Arun Suresh , Jason Lowe , "yarn-dev@hadoop.apache.org" , Chris Douglas , Markus Weimer , Daryn Sharp , Botong Huang Content-Type: multipart/alternative; boundary=f403045f1a5c6969fe054af6c284 archived-at: Sat, 18 Mar 2017 00:56:19 -0000 --f403045f1a5c6969fe054af6c284 Content-Type: text/plain; charset=UTF-8 On Fri, Mar 17, 2017 at 4:15 PM, Subru Krishnan wrote: > Thanks Arun for the heads-up. > > Hi Sergiy, > > We do run an UAM pool under one process (AMRMProxyService in NM) as that's > the mechanism we use to span a single job across multiple clusters that are > under federation. This is achieved by using the doAs method in > UserGroupInformation, exactly as Jason pointed out. > > The e2e *prototype* code (and docs/slides) is available in the Federation > umbrella jira: > https://issues.apache.org/jira/browse/YARN-2915 > > I have created a utility class that's used throughout YARN Federation to > create RMProxies per UGI - FederationProxyProviderUtil > yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop- > yarn-server-common/src/main/java/org/apache/hadoop/yarn/ > server/federation/failover/FederationProxyProviderUtil.java> > (as part of YARN-3673 ), > which should provide a good starting point for you. > > You should also keep an eye on UAM pool JIRA which Botong is working on > right now: > https://issues.apache.org/jira/browse/YARN-5531 Hi YARN devs, *Huge* thanks for your help! If I understand you correctly, that means I do not need any changes to YARN client API to run multiple AMs in one process - an excellent news! I will study the federation code and try that technique in REEF. I'll let you know how it goes. Again, thanks a lot Subru, Arun, and Jason -- you guys are awesome :) Cheers, Sergiy. > On Thu, Mar 16, 2017 at 2:49 PM, Arun Suresh > wrote: > > > Hey Sergiy, > > > > I think a similar approach IIUC, where an AM for a app running on a > > cluster acts as an unmanaged AM on another cluster. I believe they use a > > separate UGI for each sub-cluster and wrap it around a doAs before the > > actual allocate call. > > > > Subru might be able to give more details. > > > > Cheers > > -Arun > > > > On Thu, Mar 16, 2017 at 2:34 PM, Jason Lowe > > > wrote: > > > >> The doAs method in UserGroupInformation is what you want when dealing > >> with multiple UGIs. It determines what UGI instance the code within the > >> doAs scope gets when that code tries to lookup the current user. > >> Each AM is designed to run in a separate JVM, so each has some > >> main()-like entry point that does everything to setup the AM. > >> Theoretically all you need to do is create two, separate UGIs then use > each > >> instance to perform a doAs wrapping the invocation of the corresponding > >> AM's entry point. After that, everything that AM does will get the UGI > of > >> the doAs invocation as the current user. Since the AMs are running in > >> separate doAs instances they will get separate UGIs for the current user > >> and thus separate credentials. > >> Jason > >> > >> > >> On Thursday, March 16, 2017 4:03 PM, Sergiy Matusevych < > >> sergiy.matusevych@gmail.com> wrote: > >> > >> > >> Hi Jason, > >> > >> Thanks a lot for your help again! Having two separate > >> UserGroupInformation instances is exactly what I had in mind. What I do > not > >> understand, though, is how to make sure that our second call to > >> .regsiterApplicationMaster() will pick the right UserGroupInformation > >> object. I would love to find a way that does not involve any changes to > the > >> YARN client, but if we have to patch it, of course, I agree that we > need to > >> have a generic yet minimally invasive solution. > >> Thank you!Sergiy. > >> > >> > >> On Thu, Mar 16, 2017 at 8:03 AM, Jason Lowe > wrote: > >> > > >> > I believe a cleaner way to solve this problem is to create two, > >> _separate_ UserGroupInformation objects and wrap each AM instances in a > UGI > >> doAs so they aren't trying to share the same credentials. This is one > >> example of a token bleeding over and causing problems. I suspect trying > to > >> fix these one-by-one as they pop up is going to be frustrating compared > to > >> just ensuring the credentials remain separate as if they really were > >> running in separate JVMs. > >> > > >> > Adding Daryn who knows a lot more about the UGI stuff so he can > correct > >> any misunderstandings on my part. > >> > > >> > Jason > >> > > >> > > >> > On Wednesday, March 15, 2017 1:11 AM, Sergiy Matusevych < > >> sergiy.matusevych@gmail.com> wrote: > >> > > >> > > >> > Hi YARN developers, > >> > > >> > I have an interesting problem that I think is related to YARN Java > >> client. > >> > I am trying to launch *two* application masters in one container. To > be > >> > more specific, I am starting a Spark job on YARN, and launch an Apache > >> REEF > >> > Unmanaged AM from the Spark Driver. > >> > > >> > Technically, YARN Resource Manager should not care which process each > AM > >> > runs in. However, there is a problem with the YARN Java client > >> > implementation: there is a global UserGroupInformation object that > holds > >> > the user credentials of the current RM session. This data structure is > >> > shared by all AMs, and when REEF application tries to register the > >> second > >> > (unmanaged) AM, the client library presents to YARN RM all > credentials, > >> > including the security token of the first (managed) AM. YARN rejects > >> such > >> > registration request, throwing InvalidApplicationMasterReques > tException > >> > "Application Master is already registered". > >> > > >> > I feel like this issue can be resolved by a relatively small update to > >> the > >> > YARN Java client - e.g. by introducing a new variant of the > >> > AMRMClientAsync.registerApplicationMaster() that would take the > >> required > >> > security token (instead of getting it implicitly from the > >> > UserGroupInformation.getCurrentUser().getCredentials() etc.), or > having > >> > some sort of RM session class that would wrap all data that is > currently > >> > global. I need to think about the elegant API for it. > >> > > >> > What do you guys think? I would love to work on this problem and send > >> you a > >> > pull request for the upcoming 2.9 release. > >> > > >> > Cheers, > >> > Sergiy. > >> > > >> > > >> > >> > >> > >> > > > > > --f403045f1a5c6969fe054af6c284--