Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5795A10871 for ; Wed, 16 Apr 2014 21:52:32 +0000 (UTC) Received: (qmail 85804 invoked by uid 500); 16 Apr 2014 21:52:28 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 85683 invoked by uid 500); 16 Apr 2014 21:52:23 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 85531 invoked by uid 99); 16 Apr 2014 21:52:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2014 21:52:19 +0000 Date: Wed, 16 Apr 2014 21:52:19 +0000 (UTC) From: "Enis Soztutar (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7987) Snapshot Manifest file instead of multiple empty files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971996#comment-13971996 ] Enis Soztutar commented on HBASE-7987: -------------------------------------- bq. I see this jira more as a "improve the current situation", without having to change much of the current design. Yeah, I really think that we should switch to meta table tracking the file references. This might be ok for short term if it does not add bunch of more complexity. > Snapshot Manifest file instead of multiple empty files > ------------------------------------------------------ > > Key: HBASE-7987 > URL: https://issues.apache.org/jira/browse/HBASE-7987 > Project: HBase > Issue Type: Improvement > Components: snapshots > Reporter: Matteo Bertozzi > Assignee: Matteo Bertozzi > Fix For: 1.0.0 > > Attachments: HBASE-7987-v0.patch, HBASE-7987-v1.patch, HBASE-7987-v2.patch, HBASE-7987-v2.sketch, HBASE-7987-v3.patch, HBASE-7987-v4.patch, HBASE-7987.sketch > > > Currently taking a snapshot means creating one empty file for each file in the source table directory, plus copying the .regioninfo file for each region, the table descriptor file and a snapshotInfo file. > during the restore or snapshot verification we traverse the filesystem (fs.listStatus()) to find the snapshot files, and we open the .regioninfo files to get the information. > to avoid hammering the NameNode and having lots of empty files, we can use a manifest file that contains the list of files and information that we need. > To keep the RS parallelism that we have, each RS can write its own manifest. > {code} > message SnapshotDescriptor { > required string name; > optional string table; > optional int64 creationTime; > optional Type type; > optional int32 version; > } > message SnapshotRegionManifest { > optional int32 version; > required RegionInfo regionInfo; > repeated FamilyFiles familyFiles; > message StoreFile { > required string name; > optional Reference reference; > } > message FamilyFiles { > required bytes familyName; > repeated StoreFile storeFiles; > } > } > {code} > {code} > /hbase/.snapshot/ > /hbase/.snapshot//snapshotInfo > /hbase/.snapshot// > /hbase/.snapshot///tableInfo > /hbase/.snapshot///regionManifest(.n) > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)