Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B681D97C6 for ; Sat, 7 Jan 2012 18:40:01 +0000 (UTC) Received: (qmail 29484 invoked by uid 500); 7 Jan 2012 18:40:01 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 29426 invoked by uid 500); 7 Jan 2012 18:40:00 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 29418 invoked by uid 99); 7 Jan 2012 18:40:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jan 2012 18:40:00 +0000 X-ASF-Spam-Status: No, hits=-2001.6 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jan 2012 18:39:59 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 54E6413C6F1 for ; Sat, 7 Jan 2012 18:39:39 +0000 (UTC) Date: Sat, 7 Jan 2012 18:39:39 +0000 (UTC) From: "Joe Kraska (Commented) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <884526496.18855.1325961579349.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-233) Support for snapshots MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182056#comment-13182056 ] Joe Kraska commented on HDFS-233: --------------------------------- Reviewing the comments and noting the dataware housing feature requests and the like, I thought I would comment on the snapshot feature from the more pragmatic perspective of simple, responsible data stewardship. By and large, the most important features of snapshots are being able to: 1. Do them live. 2. Do them economically: do not require particularly large amounts of space for the snapshot. 3. Being able to have a dozen or so (and often less). 4. Being able to schedule them (hourly, daily, weekly, with emphasis on the latter two) 5. Being able to selectively restore portions of the tree due to user- or program- caused erasure or damage 6. Being able to quickly conduct a restore of either a sub portion of the tree or an entire volume. The above set of features are about fundamental data protection, cost, and restore time objectives. They are directly related to economical data stewardship, and are considered the first line of defense for data protection in many enterprises today. I.e., we data stewards prefer these features over tape restores (although we also use tape, we hate it). *AFTER* the above, space-efficient *writable* snapshots are interesting. This is because there are applications for test for current data sets where touching the master data set is a complete no-no, but the application needs to make trial changes. These snapshots are often made, modified for a while, then deleted. You will want minimal performance impact for these snapshots, because the assumption should be that the scheduled snapshot system is ALWAYS used. The one exception to this is static read-only data where a single manual snapshot is recorded just once. Everything else will have something like 2 daily and 2 weekly snapshots going all the time. Some enterprises will also use hourly snapshots scheduled every 6 hours or so and retain about a day of those... As a side note (and no offense to the hadoop community), I regard all shared storage as defective for data stewardship purpose if it does not have the above features (except writable snapshots, that's candy), and I am not the least bit alone. Any data protection strategy that says "go to tape for that" as its first offer is... onerous. While the following matter is merely my opinion, I feel pretty sure that the rise of the enterprise NAS appliance (e.g., NetApp et al) is at least partly due to the default nature of snapshot protection on those devices. Food for thought. > Support for snapshots > --------------------- > > Key: HDFS-233 > URL: https://issues.apache.org/jira/browse/HDFS-233 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: Snapshots.pdf, Snapshots.pdf > > > Support HDFS snapshots. It should support creating snapshots without shutting down the file system. Snapshot creation should be lightweight and a typical system should be able to support a few thousands concurrent snapshots. There should be a way to surface (i.e. mount) a few of these snapshots simultaneously. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira