Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 28495 invoked from network); 28 Oct 2010 19:14:44 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Oct 2010 19:14:44 -0000 Received: (qmail 88315 invoked by uid 500); 28 Oct 2010 19:14:44 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 88290 invoked by uid 500); 28 Oct 2010 19:14:44 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 88282 invoked by uid 99); 28 Oct 2010 19:14:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Oct 2010 19:14:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Oct 2010 19:14:43 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9SJENmp012388 for ; Thu, 28 Oct 2010 19:14:23 GMT Message-ID: <13855879.126081288293263156.JavaMail.jira@thor> Date: Thu, 28 Oct 2010 15:14:23 -0400 (EDT) From: "HBase Review Board (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-3159) Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master In-Reply-To: <11620149.94571288158864929.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925920#action_12925920 ] HBase Review Board commented on HBASE-3159: ------------------------------------------- Message from: stack@duboce.net ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1108/ ----------------------------------------------------------- Review request for hbase and Jonathan Gray. Summary ------- Here is patch I've been testing with up on cluster. Adds debugging and two fixes -- one setting state to OPEN on receipt of a rs opened event and two, resetting a watcher getting data over in zkutil (as per its documentation claims). This addresses bug hbase-3159. http://issues.apache.org/jira/browse/hbase-3159 Diffs ----- src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 30e49c8 src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java 1a88700 src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java d4fa82b src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 81661ef src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 21a4256 Diff: http://review.cloudera.org/r/1108/diff Testing ------- Running on cluster Thanks, stack > Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master > --------------------------------------------------------------------------------------------------- > > Key: HBASE-3159 > URL: https://issues.apache.org/jira/browse/HBASE-3159 > Project: HBase > Issue Type: Bug > Reporter: stack > Priority: Blocker > Fix For: 0.90.0 > > Attachments: hbase-meta-dupe-opened-master-only.txt, hbase-meta-dupe-opened.txt, master-root-assign-abort.log, rs_death_on_meta_open_no_root.txt, TestRollingRestart-v4.patch > > > Here is master log with annotations: http://people.apache.org/~stack/master.txt > Region in question is: > b8827a67a9d446f345095d25e1f375f7 > The running code is doctored in that I've added in a bit of logging -- zk in particular -- and I've also removed what I thought was a provocation of this condition, reassign inside in an assign if server has gone away when we try the open rpc (Turns out we have the condition even w/o this code in place). > The log starts where the region in question timesout in RIT. > We assign it to 186. > Notice how we see 'Handling transition' for this region TWICE. This means two OpenedRegionHandlers will be scheduled -- and so the failure to delete a znode already gone. > As best I can tell, the watcher for this region is triggered once only -- which is odd because how then the double scheduling of OpenedRegionHandler but also, why am I not seeing OPENING, OPENING, OPENED and only what I presume is an OPENED? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.