Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B23009330 for ; Mon, 9 Jan 2012 13:19:16 +0000 (UTC) Received: (qmail 66110 invoked by uid 500); 9 Jan 2012 13:19:16 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 65094 invoked by uid 500); 9 Jan 2012 13:19:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 64431 invoked by uid 99); 9 Jan 2012 13:19:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 13:19:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 13:19:00 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 4D27A13F054 for ; Mon, 9 Jan 2012 13:18:40 +0000 (UTC) Date: Mon, 9 Jan 2012 13:18:40 +0000 (UTC) From: "ramkrishna.s.vasudevan (Created) (JIRA)" To: issues@hbase.apache.org Message-ID: <1917034841.21435.1326115120317.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted ----------------------------------------------------------------------------------------------------------------------- Key: HBASE-5155 URL: https://issues.apache.org/jira/browse/HBASE-5155 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Priority: Blocker ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM. -> A regionserver goes down. In our cluster the regionserver holds lot of regions. -> A region R1 has two daughters D1 and D2. -> The ServerShutdownHandler gets called and scans the META and gets all the user regions -> Parallely a table is disabled. (No problem in this step). -> Delete table is done. -> The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned) -> Now ServerShutdownhandler starts to processTheDeadRegion {code} if (hri.isOffline() && hri.isSplit()) { LOG.debug("Offlined and split region " + hri.getRegionNameAsString() + "; checking daughter presence"); fixupDaughters(result, assignmentManager, catalogTracker); {code} As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 {code} if (isDaughterMissing(catalogTracker, daughter)) { LOG.info("Fixup; missing daughter " + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null); // TODO: Log WARN if the regiondir does not exist in the fs. If its not // there then something wonky about the split -- things will keep going // but could be missing references to parent region. // And assign it. assignmentManager.assign(daughter, true); {code} we call assign of the daughers. Now after this we again start with the below code. {code} if (processDeadRegion(e.getKey(), e.getValue(), this.services.getAssignmentManager(), this.server.getCatalogTracker())) { this.services.getAssignmentManager().assign(e.getKey(), true); {code} Now when the SSH scanned the META it had R1, D1 and D2. So as part of the above code D1 and D2 which where assigned by fixUpDaughters is again assigned by {code} this.services.getAssignmentManager().assign(e.getKey(), true); {code} Thus leading to a zookeeper issue due to bad version and killing the master. The important part here is the regions that were deleted are recreated which i think is more critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira