hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
Date Mon, 22 Oct 2012 18:16:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481571#comment-13481571

Aaron T. Myers commented on HDFS-2802:

bq. Aaron, by O(N) where N=# of files + # of directories, I guess you mean O(N) snapshot creation
time and O(N) memory usage at snapshot creation. Snapshot creation can be optimized by lazy
INode creation. No INode is created at snapshot creation time. Only the INode modified after
snapshot will be created. Then it becomes O(1) snapshot creation time and O(1) memory usage
at snapshot creation. The design does not exclude this optimization.

On page 7 the design document says the following:

The memory usage is linear to the number of INode snapped because it copies all INodes when
a snapshot is created.

Then later on page 7 the design document goes on to discuss how this might be optimized by
supporting offline snapshots on disk (i.e. get the snapshot metadata out of the NN's heap),
and performing the copying of all the INodes in parallel using several threads.

This is what I am referring to. I fully support a design which implements O(1) snapshot creation
time and O(1) memory usage, but the current proposed design does not describe such a thing.
Instead of implementing an inefficient design and then optimizing it, we should come up with
and implement an efficient design.
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
> Snapshots are point in time images of parts of the filesystem or the entire filesystem.
Snapshots can be a read-only or a read-write point in time copy of the filesystem. There are
several use cases for snapshots in HDFS. I will post a detailed write-up soon with with more

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message