Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C70F217D8D for ; Wed, 8 Apr 2015 17:20:13 +0000 (UTC) Received: (qmail 57483 invoked by uid 500); 8 Apr 2015 17:20:13 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 57442 invoked by uid 500); 8 Apr 2015 17:20:13 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 57426 invoked by uid 99); 8 Apr 2015 17:20:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Apr 2015 17:20:13 +0000 Date: Wed, 8 Apr 2015 17:20:13 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-13430) HFiles that are in use by a table cloned from a snapshot may be deleted when that snapshot is deleted MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13430: -------------------------- Priority: Critical (was: Major) > HFiles that are in use by a table cloned from a snapshot may be deleted when that snapshot is deleted > ----------------------------------------------------------------------------------------------------- > > Key: HBASE-13430 > URL: https://issues.apache.org/jira/browse/HBASE-13430 > Project: HBase > Issue Type: Bug > Components: hbase > Reporter: Tobi Vollebregt > Priority: Critical > Labels: data-integrity, master > Attachments: hbase-13430-attempted-fix.patch, hbase-13430-test.patch > > > We recently had a production issue in which HFiles that were still in use by a table were deleted. This appears to have been caused by race conditions in the order in which HFileLinks are created, combined with the fact that only files younger than {{hbase.master.hfilecleaner.ttl}} are kept alive. > This is how to reproduce: > * Clone a large snapshot into a new table. The clone operation must take more than {{hbase.master.hfilecleaner.ttl}} time to guarantee data loss. > * Ensure that no other table or snapshot is referencing the HFiles used by the new table. > * Delete the snapshot. This breaks the table. > The main cause is this: > * Cloning a snapshot creates the table in the {{HBASE_TEMP_DIRECTORY}}. > * However, it immediately creates back references to the HFileLinks that it creates for the table in the archive directory. > * HFileLinkCleaner does not check the {{HBASE_TEMP_DIRECTORY}}, so it considers all those back references deletable. > * The only thing that keeps them alive is the TimeToLiveHFileCleaner, but only for 5 minutes. > * So if cloning the snapshot takes more than 5 minutes, and the HFiles aren't referenced by anything else, data loss is guaranteed. > I have a unit test reproducing the issue and I tried to fix this, but didn't completely succeed. I will attach the patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)