Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A8EFE9C65 for ; Mon, 10 Oct 2011 02:12:53 +0000 (UTC) Received: (qmail 48257 invoked by uid 500); 10 Oct 2011 02:12:53 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 48231 invoked by uid 500); 10 Oct 2011 02:12:53 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 48223 invoked by uid 99); 10 Oct 2011 02:12:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 02:12:53 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Oct 2011 02:12:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D555D30068F for ; Mon, 10 Oct 2011 02:12:29 +0000 (UTC) Date: Mon, 10 Oct 2011 02:12:29 +0000 (UTC) From: "bluedavy (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <2026202335.13407.1318212749875.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <328050158.33917.1305004383516.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-3872) Hole in split transaction rollback; edits to .META. need to be rolled back even if it seems like they didn't make it MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123837#comment-13123837 ] bluedavy commented on HBASE-3872: --------------------------------- We fix the bug using below code: if (!testing) { + this.journal.add(JournalEntry.PONR); MetaEditor.offlineParentInMeta(server.getCatalogTracker(),this.parent.getRegionInfo(), a.getRegionInfo(), b.getRegionInfo()); } - this.journal.add(JournalEntry.PONR); > Hole in split transaction rollback; edits to .META. need to be rolled back even if it seems like they didn't make it > -------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-3872 > URL: https://issues.apache.org/jira/browse/HBASE-3872 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.90.3 > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 0.90.4 > > Attachments: 3872-v2.txt, 3872.txt > > > Saw this interesting one on a cluster of ours. The cluster was configured with too few handlers so lots of the phenomeneon where actions were queued but then by the time they got into the server and tried respond to the client, the client had disconnected because of the timeout of 60 seconds. Well, the meta edits for a split were queued at the regionserver carrying .META. and by the time it went to write back, the client had gone (the first insert of parent offline with daughter regions added as info:splitA and info:splitB). The client presumed the edits failed and 'successfully' rolled back the transaction (failing to undo .META. edits thinking they didn't go through). > A few minutes later the .META. scanner on master runs. It sees 'no references' in daughters -- the daughters had been cleaned up as part of the split transaction rollback -- so it thinks its safe to delete the parent. > Two things: > + Tighten up check in master... need to check daughter region at least exists and possibly the daughter region has an entry in .META. > + Dependent on the edit that fails, schedule rollback edits though it will seem like they didn't go through. > This is pretty critical one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira