hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-2802) Support for RW/RO snapshots in HDFS
Date Mon, 29 Oct 2012 00:19:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Aaron T. Myers updated HDFS-2802:

    Attachment: HDFSSnapshotsDesign.pdf

Hi all, attached please find a somewhat different design for implementing snapshot support
in HDFS that myself and a few others have discussed. Please have a look at it.

Though this design differs somewhat from the previous design posted by Nicholas, I don't think
the two designs insurmountably far apart. Though I certainly don't expect to switch to this
design wholesale, I would like to see if we can come up with a hybrid design which incorporates
some aspects of both. Let me try to outline where I see the designs differing, and suggest
ways we can move forward with a hybrid design.

# *Efficiency of snapshot creation.* In the design posted by Nicholas, creation of a snapshot
is O{n} in terms of the number of files/directories captured by the snapshot, both in terms
of time and space efficiency. The design proposed in this document would be O{1} at snapshot
creation time, and then copy-on-write thereafter for files/directories which are modified
after the snapshot is created. This is accomplished by assigning unique, increasing integer
IDs to snapshots and giving each INode a start_snap and end_snap ID to denote which snapshots
the INode should be a part of. I'm not wedded to the precise design described in this document,
but it seems like a reasonable design to me, so I'd like to consider this for the design to
implement HDFS-4103 (Support O{1} snapshot creation).
# *Support for subdirectory snapshots.* The design posted by Nicholas allows for individual
subdirectories of an HDFS namespace to be snapshotted by introducing "snapshottable directories."
The design proposed in this document would only support snapshots at the root level of the
file system. I think an easy way to produce a hybrid between these two designs would be to
stick with the "snapshottable directory" system described in the document posted by Nicholas,
and store the snapshot ID info at that INodeDirectory, instead of globally for the whole file
system as is described in the document I've just posted. Such a scheme will allow both for
efficient snapshot creation and creation of snapshots of subdirectories of the file system.
# *Support for non-super users to create snapshots.* The design posted by Nicholas allows
for non-super users to create snapshots. The scheme described in the document I've just posted
would only allow super users to create snapshots, in instances where administrators want tight
control over the snapshots in their system. I propose we stick with the design described in
the document posted by Nicholas, but allow for user-initiated snapshot creation to be optionally
disabled by the administrator, either globally or per-snapshottable directory. This should
allow for both use cases simultaneously.
# *Materialization of snapshots.* The scheme described in the document posted by Nicholas
allows for the state of the FS in a snapshot to only be accessed from the snapshot root, i.e.
the snapshottable directory, and allows for snapshots to be created with arbitrary names.
The scheme described by the document I've just posted would have the return value of ClientProtocol#getListing
modified on the fly by the NameNode so that a ".snapshots" directory will appear to be present
in every directory which has a snapshot available for it, with the available snapshots listed
under this "directory" by their snapshot ID. This is similar to the user experience that users
of WAFL file systems are familiar to, and so should be familiar to many users of FS snapshots.
I'd like us to consider going with this scheme.

Please consider this proposal. I'd love to discuss this further at the design meeting later
this week as previously mentioned by Suresh. By the way, can we nail down the precise date/time
for that meeting? Sanjay mentioned to me offline that it would probably be on Wednesday, but
I haven't heard anything beyond that. I'd be happy to offer up space in the Cloudera office,
if that would be helpful. Let me know.

Thanks everyone.
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: HDFSSnapshotsDesign.pdf, snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
> Snapshots are point in time images of parts of the filesystem or the entire filesystem.
Snapshots can be a read-only or a read-write point in time copy of the filesystem. There are
several use cases for snapshots in HDFS. I will post a detailed write-up soon with with more

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message