hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tobi Vollebregt (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-13430) HFiles that are in use by a table cloned from a snapshot may be deleted when that snapshot is deleted
Date Wed, 08 Apr 2015 17:00:19 GMT
Tobi Vollebregt created HBASE-13430:
---------------------------------------

             Summary: HFiles that are in use by a table cloned from a snapshot may be deleted
when that snapshot is deleted
                 Key: HBASE-13430
                 URL: https://issues.apache.org/jira/browse/HBASE-13430
             Project: HBase
          Issue Type: Bug
          Components: hbase
            Reporter: Tobi Vollebregt


We recently had a production issue in which HFiles that were still in use by a table were
deleted. This appears to have been caused by race conditions in the order in which HFileLinks
are created, combined with the fact that only files younger than {{hbase.master.hfilecleaner.ttl}}
are kept alive.

This is how to reproduce:

 * Clone a large snapshot into a new table. The clone operation must table more than {{hbase.master.hfilecleaner.ttl}}
to guarantee data loss.
 * Ensure that no other table or snapshot is referencing the HFiles used by the new table.
 * Delete the snapshot. This breaks the table.

The main cause is this:

 * Cloning a snapshot creates the table in the {{HBASE_TEMP_DIRECTORY}}.
 * However, it immediately creates back references to the HFileLinks that it creates for the
table in the archive directory.
 * HFileLinkCleaner does not check the {{HBASE_TEMP_DIRECTORY}}, so it considers all those
back references deletable.
 * The only thing that keeps them alive is the TimeToLiveHFileCleaner, but only for 5 minutes.
 * So if cloning the snapshot takes more than 5 minutes, and the HFiles aren't referenced
by anything else, data loss is guaranteed.

I have a unit test reproducing the issue and I tried to fix this, but didn't completely succeed.
I will attach the patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message