hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7987) Snapshot Manifest file instead of multiple empty files
Date Tue, 05 Mar 2013 15:25:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593471#comment-13593471
] 

Matteo Bertozzi commented on HBASE-7987:
----------------------------------------

A snapshot is a set of empty files, since we don't need the content.
basically the restore uses only the name of the file to know what to restore.
(the only special case is a reference file)

{quote}What would be the recommended format when this optimization is in place ? Are we going
to provide option for user to switch between the former and this optimization ?{quote}
again not sure what you mean, if we're going for a format2, there will be just format2 on
write, and both on read like the HFile.

The format that I've proposed is the one in the description basically each RS dump a list
of SnapshotRegionManifest (one file per RS)
The master writes as today the .snapshotInfo and the .tableInfo
                
> Snapshot Manifest file instead of multiple empty files
> ------------------------------------------------------
>
>                 Key: HBASE-7987
>                 URL: https://issues.apache.org/jira/browse/HBASE-7987
>             Project: HBase
>          Issue Type: Improvement
>          Components: snapshots
>            Reporter: Matteo Bertozzi
>
> Currently taking a snapshot means creating one empty file for each file in the source
table directory, plus copying the .regioninfo file for each region, the table descriptor file
and a snapshotInfo file.
> during the restore or snapshot verification we traverse the filesystem (fs.listStatus())
to find the snapshot files, and we open the .regioninfo files to get the information.
> to avoid hammering the NameNode and having lots of empty files, we can use a manifest
file that contains the list of files and information that we need.
> To keep the RS parallelism that we have, each RS can write its own manifest.
> {code}
> message SnapshotDescriptor {
>   required string name;
>   optional string table;
>   optional int64 creationTime;
>   optional Type type;
>   optional int32 version;
> }
> message SnapshotRegionManifest {
>   optional int32 version;
>   required RegionInfo regionInfo;
>   repeated FamilyFiles familyFiles;
>   message StoreFile {
>     required string name;
>     optional Reference reference;
>   }
>   message FamilyFiles {
>     required bytes familyName;
>     repeated StoreFile storeFiles;
>   }
> }
> {code}
> {code}
> /hbase/.snapshot/<snapshotName>
> /hbase/.snapshot/<snapshotName>/snapshotInfo
> /hbase/.snapshot/<snapshotName>/<tableName>
> /hbase/.snapshot/<snapshotName>/<tableName>/tableInfo
> /hbase/.snapshot/<snapshotName>/<tableName>/regionManifest(.n)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message