hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashu Pachauri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18399) Files in a snapshot can go missing even after the snapshot is taken successfully
Date Mon, 24 Jul 2017 18:40:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098961#comment-16098961

Ashu Pachauri commented on HBASE-18399:

[~ram_krish] I am sorry I did not see your previous comment. I actually started working on
HBASE-18398, which after some investigation seems to experience the same underlying problem
as this issue: The snapshot operation is done under a region level read lock while the active
store file list is updated under the store level lock. This means that, as you suggested,
it could very well happen prior to 1.3, and I don't have a concrete explanation as to why
it did no happen (or was not noticeable). One reason could be that in branch-1.3, the archival
happens asynchronously by the HFileArchiver as opposed to being done on compaction path prior
to branch-1.3.

I am working on solution to HBASE-18398 which, I believe should be able to fix this too.

> Files in a snapshot can go missing even after the snapshot is taken successfully
> --------------------------------------------------------------------------------
>                 Key: HBASE-18399
>                 URL: https://issues.apache.org/jira/browse/HBASE-18399
>             Project: HBase
>          Issue Type: Sub-task
>          Components: regionserver, Scanners
>            Reporter: Ashu Pachauri
>             Fix For: 1.3.2
> Files missing after the snapshot is taken (only applicable when the TTL for the TimeToLiveHFileCleaner
is small, like the default 5 mins)
>     * SnapshotManifest#addRegion visits store_file_A, but is yet to write it to the manifest.
>     * store_file_A is marked as compacted away and HFileArchiver moves the file to archive.
>     * HFileCleaner comes in and sees the store_file_A in archive. It adds the file to
the list of files that might need to be cleaned up.
>     * HFileCleaner's SnapshotHFileCleaner plugin is kicked in.
>     * SnapshotFileCache#getUnreferencedFiles also says that store_file_A is unreferenced
and should be cleaned up (It has not yet been written to the manifest).
>     * SnapshotHFileCleaner is still going through rest of the files in archive.
>     * store_file_A reference is created and written to snapshot manifest.
>     * Snapshot verification runs and sees the store_file_A is present in archive, and
thus the verification passes.
>     * Now, the SnapshotHFileCleaner finishes and TimeToLiveHFileCleaner is triggered.
If TTL has passed since the store_file_A was moved to archive (SnapshotHFileCleaner could
take easily several minutes to go through rest of the files), the TimeToLiveHFileCleaner also
marks the file as deletable.
>     * Since all cleaner plugins marked file as deletable, the store_file_A is deleted.

This message was sent by Atlassian JIRA

View raw message