hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4103) Support O(1) snapshot creation
Date Fri, 21 Dec 2012 03:31:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537650#comment-13537650
] 

Aaron T. Myers commented on HDFS-4103:
--------------------------------------

I've just finished reviewing the patch. Given that nothing's been committed to the branch
since this was committed, it seems easiest to me to revert the commit, address this feedback,
and then commit again with the feedback incorporated. If for some reason you'd rather address
this feedback via a follow-up JIRA, that's fine by me as well.

The patch overall looks very good. I agree with all of the comments that Suresh has previously
posted. Of the comments below, I think that #16 is probably the most serious.

# Some commented out code in TestSnapshot without explanation. Add a comment? File a JIRA?
# TestNestedSnapshots - I'm really not a fan of disabling loggers in the tests. If the tests
fails, it may be impossible to tell why because the relevant log message may have been disabled.
# With regard to the comment disabling the append test case of TestSnapshotPathINodes - appreciate
you adding the TODO, but can you please elaborate in that comment about what the problem is
currently with append? And/or file a follow-up JIRA to fix this issue?
# There's a few "System.out.println("XXX ...")" left throughout the patch. If you're going
to leave these in, please make it a little clearer what the point is, along the lines of "====>
Test completed, dumbing tree for debugging <====" or something.
# TestSnapshot - there's some commented out code which apparently would have tested FileAppend.
Add a TODO and/or file a JIRA?
# TestNestedSnapshots#print - please rename parameter "mess" to "message" or "msg" or something
clearer. Using "mess" threw me off a bit at first.
# TestNestedSnapshots#assertFile - could you please add a method comment here? Doesn't have
to be a proper javadoc necessarily, but it's a little tough to tell what the method's doing
at first blush. It's also a little odd that the method appears to be written so as to support
an arbitrary number of paths to verify since it uses varargs, but it in fact only supports
verifying the presence of exactly 3 paths. Maybe switch to a three-arg method?
# TestNestedSnapshots#assertFile - please move "bar" to a constant, given that it's hard-coded
both here and in testNestedSnapshots.
# Seems like TestSnapshotReplication#getINodeFile can be made private.
# TestSnapshotPathINodes#testSnapshotPathINodesAfterModification - there's an assertFalse(...)
which compares two values for mod times. Please consider adding an assert message there to
print out the actual values of of the mod times.
# Considering the many places where we now call INodeDirectory#addChild with the third argument
being "null", perhaps add a two-arg method which defaults to passing null?
# SnapshotDiff#compareTo - please change that_snapshot to "thatSnapshot" or "otherSnapshot"
# Considering that the Pair and Triple classes are pretty generic classes and are now being
used in several somewhat unrelated parts of the code, consider moving them out of inner classes
of INode and into some util package? Perhaps in Common?
# Really good class comment for SnapshotDiff. Very helpful and well-described.
# Having SnapshotDiff as a non-static inner class of INodeDirectorySnapshottable makes it
a little difficult to follow the scope of some of the references. I think it's a sufficiently
meaty class to warrant being a top-level class, with a reference to the associated INodeDirectorySnapshottable.
If for some reason you want to leave it as an inner class, let's make it a static inner class.
# SnapshotDiff#getSnapshotINode - given that this may end up recursively calling getSnapshotINode,
it seems like we might easily end up overflowing the stack if a directory has many snapshots
taken of it with changes between each snapshot. I believe by default a StackOverflowException
will end up getting called after a stack is 1024 method calls deep, yet the limit of how many
snapshots can exist per snapshottable directory is set to 65536.
# I recommend adding a class comment to INodeDirectoryWithSnapshot to explain a little bit
about how it does its work by overriding several methods of the INodeDirectory class to look
back into snapshots as appropriate. Once I understood it it was very clear, but I think it
could stand to be explained with a comment.
                
> Support O(1) snapshot creation
> ------------------------------
>
>                 Key: HDFS-4103
>                 URL: https://issues.apache.org/jira/browse/HDFS-4103
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: Snapshot (HDFS-2802)
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: Snapshot (HDFS-2802)
>
>         Attachments: h4103_20121129.patch, h4103_20121202b.patch, h4103_20121202.patch,
h4103_20121209b.patch, h4103_20121209.patch, h4103_20121210b.patch, h4103_20121210.patch,
h4103_20121211.patch, h4103_20121212b.patch, h4103_20121212.patch, h4103_20121213.patch, h4103_20121215b.patch,
h4103_20121215.patch, h4103_20121216.patch, h4103_20121217b.patch, h4103_20121217.patch, h4103_20121218.patch,
h4103_20121219.patch, h4103_20121220.patch
>
>
> In our first snapshot implementation, snapshot creation runs in O(N) and occupies O(N)
memory space, where N = # files + # directories + # symlinks in the snapshot.  The advantages
of the implementation are that there is no additional cost for the modifications after snapshots
are created, and it leads to a simple implementation.
> In this JIRA, we optimize snapshot creation to O(1) although it introduces additional
cost in the modifications after snapshots are created.  Note that the INode is given as an
assumption, otherwise, there is a non-constant cost to find the INode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message