hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3637) Support for snapshots
Date Thu, 07 Aug 2008 20:24:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620734#action_12620734
] 

Allen Wittenauer commented on HADOOP-3637:
------------------------------------------

Overall, what has been proposed sounds very promising to me.  Some comments though:

Requirements
===========

Requirements #4 vs. Non-goal #2: "Only a few of those snapshots will be accessed simultaneiously"

What happens if it is determined that finding that data that one is looking for is expensive?
 In other words, what is Plan B?  

On busy systems, I can see where many users could be searching for data in different snapshots
very, very easily, esp. when FUSE is involved.

Requirements #5:

While I understand what you are saying and why the requirement exists :) , it would be good
to make sure this is really well documented.

[An aside... I never thought of directed graphs as being a mathematical construct.  At least
I was taught them as part of my CS courses which were distinct from the math courses.  Hmm.]

"Special Number 500"
=================

Why 500?  That seems particularly arbitrary. I would recommend starting at a digit boundary.
 1000, 10000, 100, whatever. 

Namedir Structure
==============

What happens when the number of snapshots gets large?  Any concern about things like directory
name lookup caches at the (UNIX) file system level having issues?  Would it be a good idea
to be able to support a multilevel hashed structure now or wait till someone needs it?

Appending to Files
==============
I have a bit of concern about the "wait for some period" bit.  We've noticed that when the
file system gets full at the UNIX level, the name node goes a bit spastic while it tries to
hunt for free space.  Now, clearly the name node should be better behaved in this sort of
edge-case scenario.   But I'm wondering what the client should do if, when it retries after
waiting for the NN to COW the block under such conditions. 

> Support for snapshots
> ---------------------
>
>                 Key: HADOOP-3637
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3637
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: Snapshots.pdf
>
>
> Support HDFS snapshots. It should support creating snapshots without shutting down the
file system. Snapshot creation should be lightweight and a typical system should be able to
support a few thousands concurrent snapshots. There should be a way to surface (i.e. mount)
a few of these snapshots simultaneously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message