Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 18B9B920C for ; Tue, 20 Mar 2012 21:02:00 +0000 (UTC) Received: (qmail 66541 invoked by uid 500); 20 Mar 2012 21:01:59 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 66505 invoked by uid 500); 20 Mar 2012 21:01:59 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 66497 invoked by uid 99); 20 Mar 2012 21:01:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 21:01:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2012 21:01:58 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 6ACB9C83DE for ; Tue, 20 Mar 2012 21:01:38 +0000 (UTC) Date: Tue, 20 Mar 2012 21:01:38 +0000 (UTC) From: "Aaron T. Myers (Commented) (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <470777022.38260.1332277298439.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1996082370.5467.1331594437550.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-8163) Improve ActiveStandbyElector to provide hooks for fencing old active MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233741#comment-13233741 ] Aaron T. Myers commented on HADOOP-8163: ---------------------------------------- Patch is looking pretty good, Todd. Here are some initial comments. I've so far only reviewed the non-test code. # I find it odd that we use CamelCase for the lock file name, but dash-separated-words for the most recent file name. # Not sure why we run the words "LOCKFILENAME" and "MOSTRECENT_FILE" together for constants. Underscores are cheap. # Seems odd to have "Preconditions.checkState(zkClient != null);" inside the try/catch for KeeperException. # Recommend adding something like "this method will create the znode and all parents if it does not exist" to the method comment of "ensureBaseZNode". # Seems like we should make operationSuccess, operationNodeExists, operationNodeDoesNotExist, and operationRetry static functions. I also question the usefulness of the names of these methods. Feel free to punt this to another JIRA. # Seems like ensureBaseZNode can potentially return without error, but not have actually created the desired znode, if we retry 3 times but continue to get the retry error code returned from ZK. # You left in a "TODO: should handle retries" - do you intend to address this in a separate JIRA? # Now that we have a new method which creates a different ZNode, seems like we should rename the createNode() method to make it clear that it only creates the lock file ZNode. Likewise monitorNode() should probably either be parameterized or renamed to be more specific. # Seems a little odd to throw an AssertionError in the case of unexpected app data in the ZNode we're trying to delete, though I don't have a good suggestion for a better exception. Feel free to ignore this comment. # I think it's reasonable to change the "Checking for any old active..." message to be moved from debug level to info, particularly since the "Old node exists..." message is info level. # You left in a "// TODO: think carefully about this error handling." Will this be addressed in a subsequent JIRA? > Improve ActiveStandbyElector to provide hooks for fencing old active > -------------------------------------------------------------------- > > Key: HADOOP-8163 > URL: https://issues.apache.org/jira/browse/HADOOP-8163 > Project: Hadoop Common > Issue Type: Improvement > Components: ha > Affects Versions: 0.24.0, 0.23.3 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hadoop-8163.txt > > > When a new node becomes active in an HA setup, it may sometimes have to take fencing actions against the node that was formerly active. This JIRA extends the ActiveStandbyElector which adds an extra non-ephemeral node into the ZK directory, which acts as a second copy of the active node's information. Then, if the active loses its ZK session, the next active to be elected may easily locate the unfenced node to take the appropriate actions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira