Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D40AD10BD6 for ; Tue, 3 Sep 2013 18:57:58 +0000 (UTC) Received: (qmail 39312 invoked by uid 500); 3 Sep 2013 18:57:57 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 38350 invoked by uid 500); 3 Sep 2013 18:57:56 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 37794 invoked by uid 99); 3 Sep 2013 18:57:55 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Sep 2013 18:57:55 +0000 Date: Tue, 3 Sep 2013 18:57:55 +0000 (UTC) From: "Hudson (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-9397) Snapshots with the same name are allowed to proceed concurrently MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-9397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756917#comment-13756917 ] Hudson commented on HBASE-9397: ------------------------------- FAILURE: Integrated in HBase-0.94 #1131 (See [https://builds.apache.org/job/HBase-0.94/1131/]) HBASE-9397 Snapshots with the same name are allowed to proceed concurrently (Jerry He) (mbertozzi: rev 1519767) * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/snapshot/SnapshotManager.java > Snapshots with the same name are allowed to proceed concurrently > ---------------------------------------------------------------- > > Key: HBASE-9397 > URL: https://issues.apache.org/jira/browse/HBASE-9397 > Project: HBase > Issue Type: Bug > Components: snapshots > Affects Versions: 0.95.2, 0.94.11 > Reporter: Jerry He > Assignee: Jerry He > Fix For: 0.94.12, 0.96.0 > > Attachments: HBASE-9397-0.94.patch, HBASE-9397-0.94-v2.patch, HBASE-9397-trunk.patch, HBASE-9397-trunk-v2.patch > > > Snapshots with the same name (but on different tables) are allowed to proceed concurrently. > This seems to be loop hole created by allowing multiple snapshots (on different tables) to run concurrently. > There are two checks in SnapshotManager, but fail to catch this particular case. > In isSnapshotCompleted(), we only check the completed snapshot directory. > In isTakingSnapshot(), we only check for the same table name. > The end result is the concurrently running snapshots with the same name are overlapping and messing up each other. For example, cleaning up the other's snapshot working directory in .hbase-snapshot/.tmp/snapshot-name. > {code} > 2013-08-29 18:25:13,443 ERROR org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler: Failed taking snapshot { ss=mysnapshot table=TestTable type=FLUSH } due to exception:Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:hdfs://hdtest009:9000/hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo > at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:321) > at org.apache.hadoop.hbase.master.snapshot.MasterSnapshotVerifier.verifySnapshotDescription(MasterSnapshotVerifier.java:123) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira