hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
Date Fri, 19 Oct 2012 23:06:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480503#comment-13480503

Aaron T. Myers commented on HDFS-2802:

Hey guys, thanks a lot for posting an updated design document. I have one high-level concern
about the design, and a number of specific comments / questions about the design document

First, I'm concerned with the O(# of files + # of directories) nature of this design, both
in terms of time taken to create a snapshot and the NN memory resources consumed. 

It seems to me that this will result in snapshot creation/retention being sufficiently costly
as to make creating snapshots with this design not a viable option for large HDFS instances.
I think another design requirement for this work should be "snapshot creation must be sufficiently
fast as to be unnoticeable to clients" (an attempted definition of "reasonable" per high-level
requirement #4 on page 1.) I don't think this design will satisfy such a requirement in large
HDFS instances. Also, many large HDFS instances that I'm aware of already run with very large
NN heaps, and a snapshot design which results in making copies of large portions of the working
set will not be viable in these situations. At least, I think this design does not address
the #3 high-level requirement on page 1 of "support for a reasonable number of snapshots."
In deployments where the NameNode is already running with a large working set, this design
might only allow for a single snapshot of the root of the file system, or none at all.

I think this design document is a very good start, but we really must figure out a way to
create and retain snapshots in a more efficient manner, both in terms of time to create the
snapshot and memory overhead to retain the snapshot, in order for the snapshot solution to
be viable for HDFS. Having an O(# of files + # of directories) system will not be acceptable
for all but the smallest HDFS installations. I think that creation of a snapshot should be
either O(1) or worst case O(depth of file system tree). There are many precedents for file
systems supporting more efficient snapshot creation than O(# of files + # of directories),
e.g. WAFL, ZFS, BTRFS, etc.

Second, comments on the design document itself:

# I see that you're now considering read/write snapshots as an optional requirement. Per several
of the comments on HDFS-233, it seems to me that supporting writable snapshots is an extra
complexity that many folks aren't actually interested in. I suggest we explicitly punt on
support for writable snapshots, and declare that snapshots are completely immutable. I think
doing so may have the potential to allow us to make some simplifying design decisions.
# On page 2 the design mentions that "it should be possible to extend the current design to
materialize the snapshot metadata and migrate it to outside the NameNode." I didn't see any
other discussion of this in the rest of the document. Can you perhaps expand on what you mean
by this?
# On page 2 you mention that there "are snapshot root directories that are configured by the
system administrator to allow snapshots." How are these configured? Is it something that can
be dynamically added to a running NN? Or would it need to be configured at NN startup time?
# In several places (e.g. the footnote on page 2, use case 2 on page 6) the design document
refers to file system "volumes," in particular "Snapshots are created at the volume level
simplifying administration." What "volume" are you referring to here?
# I think that the "detailed requirements" section on page 4 is missing a critical requirement:
the snapshot must be consistent from the perspective of individual clients. The requirements
state that the snapshot must be atomic, but not consistent. The consistency of the snapshot
really must be well-defined and strong. For example, it would be unacceptable if files restored
from an HDFS snapshot of a running hbase.rootdir resulted in a corrupted HBase instance.
# On page 4 you mention that a snapshot will have "a unique snapshot name for a given path."
Can you expand upon that? How is this name created? Why is this necesssary in addition to
the "path where the snapshot is created" which as far as I can tell should also serve the
purpose of uniquely identifying the snapshot?
# I think that the solution described on page 5 of the document for the length of files being
written does not satisfy the requirement that I mentioned above that the snapshot must be
consistent. In particular, I think I can construct a scenario wherein a client which performs
an NN-only metadata operation (A), then writes and hflushes some data (B), and then performs
another metadata operation (C) right before a snapshot is created may result in the snapshot
containing both metadata operations, but not the data hflush'ed to the DNs, i.e. A and C but
not B. This would result in the snapshot representing a moment in time that never existed
from the point of view of that client. We can continue the discussion of this particular issue
more on HDFS-3960, if you'd like.
# I'm glad that the document discusses atime - that is not something that I had considered
in my thinking on HDFS snapshots. I am a little leery, however, of tracking atime at all in
what is ostensibly a read-only snapshot. Does anyone know what other file systems that support
read-only snapshots do with regard to atime?
# Regarding open question #2 on page 7, I would think that this should be a hard requirement
if we go with this design, and the main motivation should be taking snapshots of different
parts of the tree on different schedules. For example, an administrator may want to schedule
a nightly snapshot of the whole FS, but hourly snapshots of their /user directories.
# Regarding open question #4 on page 7, I feel confident that this design does not currently
result in producing consistent HBase snapshots, because of #7 I described above.
# One question regarding the user experience that I don't see described in the document: will
creating a snapshot require super user privileges? Or can any user create a snapshot of a
subdirectory? If the latter, what permissions are required to create a snapshot? What if the
user doesn't have permissions on some files under the subtree of the snapshot target? Does
this result in an incomplete snapshot? Or a completely failed snapshot? My personal inclination
is to limit snapshot creation to super users only, as a simplification.
# One high-level comment on the document: I'm a little leery of introducing this new concept
of "snapshottable directories." I'm not aware of any precedent in other file systems for this
sort of restriction, and I fear that the concept may be confusing for administrators and operators
of HDFS.
> Support for RW/RO snapshots in HDFS
> -----------------------------------
>                 Key: HDFS-2802
>                 URL: https://issues.apache.org/jira/browse/HDFS-2802
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Hari Mankude
>            Assignee: Hari Mankude
>         Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf
> Snapshots are point in time images of parts of the filesystem or the entire filesystem.
Snapshots can be a read-only or a read-write point in time copy of the filesystem. There are
several use cases for snapshots in HDFS. I will post a detailed write-up soon with with more

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message