Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79DEB101CA for ; Tue, 5 Nov 2013 23:58:17 +0000 (UTC) Received: (qmail 92672 invoked by uid 500); 5 Nov 2013 23:58:17 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 92645 invoked by uid 500); 5 Nov 2013 23:58:17 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 92634 invoked by uid 99); 5 Nov 2013 23:58:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Nov 2013 23:58:17 +0000 Date: Tue, 5 Nov 2013 23:58:17 +0000 (UTC) From: "Bikas Saha (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814425#comment-13814425 ] Bikas Saha commented on YARN-1222: ---------------------------------- REQUEST_BY_USER_FORCED is probably not the right choice. {code}+ target.getProxy(getConfig(), 1000).transitionToStandby( + new HAServiceProtocol.StateChangeRequestInfo( + HAServiceProtocol.RequestSource.REQUEST_BY_USER_FORCED)); + } catch (IOException e) { {code} There are finally blocks that call methods like notifyDoneStoringApplicationAttempt() These end up sending events to the RM modules which check for the exception and then call terminate for the RM Java process. We probably dont want that to happen since we simply want to transitionToStandby and discard all the internal state. Thinking aloud, using HAServiceTarget in RMStateStore to transitionToStandby() may not be the right solution. We are effectively doing an internal RPC on an ACL'd protocol. Is it guaranteed to succeed? Should we think of sending an event to the HAProtocolService or have a reference to the HAProtocolService so that it can be directly notified about this situation. Then the HAProtocolService may transition to standby internally. The store should inform the higher entity about the fenced state and not take action on the higher entity by fencing it. Thoughts? > Make improvements in ZKRMStateStore for fencing > ----------------------------------------------- > > Key: YARN-1222 > URL: https://issues.apache.org/jira/browse/YARN-1222 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Karthik Kambatla > Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, yarn-1222-4.patch, yarn-1222-5.patch > > > Using multi-operations for every ZK interaction. > In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)