hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
Date Sat, 20 Oct 2012 00:02:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480547#comment-13480547

Suresh Srinivas commented on HDFS-2802:

Thanks for the comments guys.

bq. In some of the most commercially popular systems which implement snapshots, snapshots
do not count against the disk quotas
How do they handle disk quota use when the original file is deleted and only snapshots exit?
That is the reason why counting the disk quota makes sense.

bq. First, I'm concerned with the O(# of files + # of directories) nature of this design,
both in terms of time taken to create a snapshot and the NN memory resources consumed.
I agree with you on this. We wanted to begin with this approach and then optimize it further
in memory. The initial patch uploaded here tried premature optimization both for memory and
snapshot creation time and thus made the code really complicated. But this is a definite goal
and that part of the design we will update as we continue to work. This is covered in open
issues/future work section.

comment 1:
Agree with this part. As we continue the work, we can make a decision on this. For supporting
RW, lets not make the design/implementation more complicated.

comment 2:
Will address this as we continue to add more details to the design in the next update.

Comment 3, 6:
I want to make sure you understand this is early design and we will continue to add more details.
I think some of the questions will be answered by how this works:
- Admin can mark directories as snapshottable using CLI
- User then can create snapshots for these directories using CLI/API. A snapshot has a snapshot
name and it is unique for given snapshot root.

comment 4:
If you look at snapshot implementation in other systems it is done at volume level. That is
the parallel we are talking about.

Comment 5, Comment 7, comment 10:
As regards to consistency (comment 7), a system where snapshot is taken at the namespace without
involving data layer cannot provide string consistency guarantee. I also think it may not
be relevant where writers are different from the client that is taking the snapshot. Not sure
what guarantee such a client can expect/depend on given writers are separate. We could discuss
this during design review. I also think based on discussion with few HBase folks, they should
be okay with it. Some thing to discuss with them. I am also not clear on their dependency
on HDFS with hbase-6055.

comment 8:
This could change during implementation if we think access time may not be that important
to maintain.

comment 9:
Agreed. I am leaning towards allowing it.

comment 11:
Will add usecases

comment 12:
See the volume comment and the document sort of covers this. We could discuss this further
if the document is not clear.

> Support for RW/RO snapshots in HDFS
> -----------------------------------
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
> Snapshots are point in time images of parts of the filesystem or the entire filesystem.
Snapshots can be a read-only or a read-write point in time copy of the filesystem. There are
several use cases for snapshots in HDFS. I will post a detailed write-up soon with with more

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message