Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4317610D09 for ; Wed, 26 Feb 2014 22:20:28 +0000 (UTC) Received: (qmail 16024 invoked by uid 500); 26 Feb 2014 22:20:27 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 15904 invoked by uid 500); 26 Feb 2014 22:20:27 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 15798 invoked by uid 99); 26 Feb 2014 22:20:24 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Feb 2014 22:20:24 +0000 Date: Wed, 26 Feb 2014 22:20:24 +0000 (UTC) From: "Bikas Saha (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913612#comment-13913612 ] Bikas Saha commented on YARN-1410: ---------------------------------- Can we do this in the RM? Then if we are going to retry the submitApplication RPC we might be better off. The response of a successful submitApplication() gives us the applicationId to be used to do the step 2 getApplicationReport. This also saves 1 RPC hop. {code} ApplicationId applicationId = appContext.getApplicationId(); - appContext.setApplicationId(applicationId); + if (applicationId == null) { + applicationId = getNewApplication().getApplicationId(); + appContext.setApplicationId(applicationId); + } {code} Please put some comments in the test to help understand what is being tested. e.g. testing that failed over RM accepts the appId in submitContext even though it does not exist internally. If the test is doing failover in the test method then why is submitApplication(ApplicationId oldAppId) also causing failover? Looks like the RM already blindly accepts the appId present in the submitContext. Please change the title of the jira and link it to the other 2 jiras. > Handle client failover during 2 step client API's like app submission > --------------------------------------------------------------------- > > Key: YARN-1410 > URL: https://issues.apache.org/jira/browse/YARN-1410 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Xuan Gong > Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, YARN-1410.7.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > App submission involves > 1) creating appId > 2) using that appId to submit an ApplicationSubmissionContext to the user. > The client may have obtained an appId from an RM, the RM may have failed over, and the client may submit the app to the new RM. > Since the new RM has a different notion of cluster timestamp (used to create app id) the new RM may reject the app submission resulting in unexpected failure on the client side. > The same may happen for other 2 step client API operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)