Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 95B45D0F6 for ; Thu, 23 Aug 2012 14:52:43 +0000 (UTC) Received: (qmail 84278 invoked by uid 500); 23 Aug 2012 14:52:43 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 84174 invoked by uid 500); 23 Aug 2012 14:52:43 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 84153 invoked by uid 99); 23 Aug 2012 14:52:43 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2012 14:52:43 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id A71662C0A59 for ; Thu, 23 Aug 2012 14:52:42 +0000 (UTC) Date: Fri, 24 Aug 2012 01:52:42 +1100 (NCT) From: "Himanshu Vashishtha (JIRA)" To: issues@hbase.apache.org Message-ID: <1105572906.5731.1345733562685.JavaMail.jiratomcat@arcas> In-Reply-To: <1279073491.43210.1331295657808.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440351#comment-13440351 ] Himanshu Vashishtha commented on HBASE-5549: -------------------------------------------- @Stack: So, currently (in this patch also), we don't wait for the Expired event, which is not correct. In 6354, v2 patch also doesn't look for it. I added this check (as mentioned earlier) but sometimes, the only event received is synConnected, and not the Expired event. So, it waits on the AtomicBoolean sessionClosed, to become true, which will happen only when the watcher received the Expired event. Thinking today, it looks like we should invoke the close method only *after* the monitorwatcher is properly initialized. Hmmm, I will look at this today again. > Master can fail if ZooKeeper session expires > -------------------------------------------- > > Key: HBASE-5549 > URL: https://issues.apache.org/jira/browse/HBASE-5549 > Project: HBase > Issue Type: Bug > Components: master, zookeeper > Affects Versions: 0.96.0 > Environment: all > Reporter: nkeywal > Assignee: nkeywal > Priority: Minor > Fix For: 0.92.2, 0.96.0, 0.94.2 > > Attachments: 5549_092.txt, 5549_094.txt, 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch > > > There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. > This can happen in real life, it can happen when: > - master & zookeeper starts > - zookeeper connection is cut > - master enters the retry loop > - in the meantime the session expires > - the network comes back, the session is recreated > - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira