hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7987) Snapshot Manifest file instead of multiple empty files
Date Tue, 05 Mar 2013 15:47:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593501#comment-13593501
] 

Matteo Bertozzi commented on HBASE-7987:
----------------------------------------

{quote}Can the manifest be one per region ?{quote}
We can, but the idea behind the manifest is reducing the NN operation... and if you add more
manifest you end up in the same situation as before. (thinking at stripe compaction with more
regions and less files)

{quote}
You mentioned potential corruption of manifest file, if we keep one manifest per region, the
loss would be lower in case of corrupt manifest.
{quote}
True, but see the answer above. That's also another trade off to add to the one mentioned
above. That's why I consider this jira not major or blocked but more like an optimization
that should go in once we're sure that is better than what we have now.
                
> Snapshot Manifest file instead of multiple empty files
> ------------------------------------------------------
>
>                 Key: HBASE-7987
>                 URL: https://issues.apache.org/jira/browse/HBASE-7987
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Matteo Bertozzi
>
> Currently taking a snapshot means creating one empty file for each file in the source
table directory, plus copying the .regioninfo file for each region, the table descriptor file
and a snapshotInfo file.
> during the restore or snapshot verification we traverse the filesystem (fs.listStatus())
to find the snapshot files, and we open the .regioninfo files to get the information.
> to avoid hammering the NameNode and having lots of empty files, we can use a manifest
file that contains the list of files and information that we need.
> To keep the RS parallelism that we have, each RS can write its own manifest.
> {code}
> message SnapshotDescriptor {
>   required string name;
>   optional string table;
>   optional int64 creationTime;
>   optional Type type;
>   optional int32 version;
> }
> message SnapshotRegionManifest {
>   optional int32 version;
>   required RegionInfo regionInfo;
>   repeated FamilyFiles familyFiles;
>   message StoreFile {
>     required string name;
>     optional Reference reference;
>   }
>   message FamilyFiles {
>     required bytes familyName;
>     repeated StoreFile storeFiles;
>   }
> }
> {code}
> {code}
> /hbase/.snapshot/<snapshotName>
> /hbase/.snapshot/<snapshotName>/snapshotInfo
> /hbase/.snapshot/<snapshotName>/<tableName>
> /hbase/.snapshot/<snapshotName>/<tableName>/tableInfo
> /hbase/.snapshot/<snapshotName>/<tableName>/regionManifest(.n)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message