Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 695CD1028B for ; Wed, 12 Jun 2013 04:51:15 +0000 (UTC) Received: (qmail 78040 invoked by uid 500); 12 Jun 2013 04:51:10 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 77932 invoked by uid 500); 12 Jun 2013 04:51:09 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 77513 invoked by uid 99); 12 Jun 2013 04:51:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jun 2013 04:51:07 +0000 Date: Wed, 12 Jun 2013 04:51:07 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-7245) Recovery on failed snapshot restore MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7245: ------------------------- Fix Version/s: (was: 0.95.1) 0.95.2 > Recovery on failed snapshot restore > ----------------------------------- > > Key: HBASE-7245 > URL: https://issues.apache.org/jira/browse/HBASE-7245 > Project: HBase > Issue Type: Bug > Components: Client, master, regionserver, snapshots, Zookeeper > Reporter: Jonathan Hsieh > Assignee: Matteo Bertozzi > Fix For: 0.95.2 > > > Restore will do updates to the file system and to meta. it seems that an inopportune failure before meta is completely updated could result in an inconsistent state that would require hbck to fix. > We should define what the semantics are for recovering from this. Some suggestions: > 1) Fail Forward (see some log saying restore's meta edits not completed, then gather information necessary to build it all from fs, and complete meta edits.). > 2) Fail backwards (see some log saying restore's meta edits not completed, delete incomplete snapshot region entries from meta.) > I think I prefer 1 -- if two processes end somehow updating (somehow the original master didn't die, and a new one started up) they would be idempotent. If we used 2, we could still have a race and still be in a bad place. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira