Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2BEF5200C80 for ; Thu, 11 May 2017 00:30:11 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2A7A3160B9C; Wed, 10 May 2017 22:30:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4B6CE160BB4 for ; Thu, 11 May 2017 00:30:10 +0200 (CEST) Received: (qmail 4857 invoked by uid 500); 10 May 2017 22:30:09 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 4841 invoked by uid 99); 10 May 2017 22:30:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 May 2017 22:30:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0993CC0EA2 for ; Wed, 10 May 2017 22:30:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id NSiFuFmoxQ-P for ; Wed, 10 May 2017 22:30:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id D4D135FC43 for ; Wed, 10 May 2017 22:30:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E4A7DE0D2B for ; Wed, 10 May 2017 22:30:05 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 95AA221E1A for ; Wed, 10 May 2017 22:30:04 +0000 (UTC) Date: Wed, 10 May 2017 22:30:04 +0000 (UTC) From: "Karthik Kambatla (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5531) UnmanagedAM pool manager for federating application across clusters MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 10 May 2017 22:30:11 -0000 [ https://issues.apache.org/jira/browse/YARN-5531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005569#comment-16005569 ] Karthik Kambatla commented on YARN-5531: ---------------------------------------- Thanks for working on this, [~botong]. I took a close look at the new files and skimmed through the remaining. Comments: # Is yarn-server the best place for these? In the future, don't we want other clients to use this UAMPool? If we do change it to a different package, we need to think about the Visibility and Stability annotations. # UnmanagedAMPoolManager: ## The create methods seem to be expect AppAttemptId from the user. Is that reasonable? Should it be the other way round, where we give the user AppAttemptId for the new app created? ## What are the benefits of using maps keyed by String, passed by the user. Why not just use ApplicationAttemptId? create methods could just return the app-attempt? ## Nit: In serviceStart, when creating maps, no need to specify the types starting Java 7. ## In serviceStart and serviceStop, shouldn't we call the equivalue super. methods right at the end? Otherwise, the state machine would transition the service to INITED or STOPPED even if it is not fully in that state? ## serviceStop ### I see the code tries to parallelize killing AMs. Is this necessary? How bad is sequential killing of apps? ### Nit: ExecutionCompletionService doesn't need the type in the creation. ### Why do we need the lock on the uamMap? ### Nit: Style choice. Where possible, I like to avoid nesting. The isEmpty check is for the logging. Can we not have the for nested. ### If we fail to kill the application, is catching the exception enough? Is there merit to retrying? Should we capture this state and throw an exception past this loop? ## createUAM should be annotated @VisibleForTesting ## Nit: allocateAsync: Don't see the need for variable uam. ## finishAM ### Nit: Don't see the need for variable uam. ### Don't we need to handle the case where the app is still registered? Retry? # UnmanagedApplicationManager ## Should this class be called UnmanagedApplicationMaster? ## Constructor: Don't need to specify type when creating LinkedBlockingQueue ## UnmanagedAMLauncher ### It is not clear to me that this needs to be a separate inner class, outside of grouping methods that create an AM. ### submitAndGetAppId doesn't seem to really get app id? ### Why not use YarnClient? I understand this UAM pool is currently in yarn-server, but once we move this out, it should be easier. ### Would it be possible to have a single monitor method? ### Isn't one second too long a wait in monitor* methods? ## UnmanagedAMIdentifier can be private, so can be its methods. ## CallbackHandlerThread ### Can the combination of requestQueue and CallbackHandlerThread be achieved using a dispatcher? ### Should this thread be named HeartbeatHandlerThread or AMRequestHandlerThread? The thread is processing requests. ### We seem to throw RuntimeExceptions. Should these be YarnExceptions instead? ### Since the thread can crash, it is nicer to implement an UncaughtExceptionhandler for this thread? ## finishApplicationMaster ### Can the two {{if (rmProxy == null)}} checks be merged into one? ### Should the {{rmProxy.finishApplicationMaster}} be in a loop? Or, is one check and re-register enough? ## allocateAsync ### Is it okay to ignore the InterruptedException? ### The warning on UAM not being launched/registered seems unnecessary. ### Should the {{rmProxy == null && registerRequest == null}} check be first before we even queue this request? > UnmanagedAM pool manager for federating application across clusters > ------------------------------------------------------------------- > > Key: YARN-5531 > URL: https://issues.apache.org/jira/browse/YARN-5531 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager > Reporter: Subru Krishnan > Assignee: Botong Huang > Attachments: YARN-5531-YARN-2915.v10.patch, YARN-5531-YARN-2915.v1.patch, YARN-5531-YARN-2915.v2.patch, YARN-5531-YARN-2915.v3.patch, YARN-5531-YARN-2915.v4.patch, YARN-5531-YARN-2915.v5.patch, YARN-5531-YARN-2915.v6.patch, YARN-5531-YARN-2915.v7.patch, YARN-5531-YARN-2915.v8.patch, YARN-5531-YARN-2915.v9.patch > > > One of the main tenets the YARN Federation is to *transparently* scale applications across multiple clusters. This is achieved by running UAMs on behalf of the application on other clusters. This JIRA tracks the addition of a UnmanagedAM pool manager for federating application across clusters which will be used the FederationInterceptor (YARN-3666) which is part of the AMRMProxy pipeline introduced in YARN-2884. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org