hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-432) support undelete, snapshots, or other mechanism to recover lost files
Date Mon, 18 Dec 2006 19:49:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-432?page=comments#action_12459441 ] 
Doug Cutting commented on HADOOP-432:

> doing this every few minutes is expensive

Yes, walking the entire trash bucket too frequently would be a problem.  So we can either
walk it less frequently and/or don't walk the whole thing.  I've proposed bucketing the trash
into 10-or-more minute sub-directories, so that only the root trash directory need be listed,
and even that should only be listed every 30 or more minutes.

> Creating a folder for every X minutes (how many?) will make restoring a file harder.

But, with globbing, it won't be too hard.  The primary point is not to make restoring files
ultra-simple but rather to make it possible.

> an external process reclaiming space needs to be monitored, otherwise files will accumulate
in the trash and the dfs will fill up

If the trash is full then folks can empty the trash.  I'm not arguing that we shouldn't start
a thread in the namenode that empties the trash, just that this thread should be reusable
code, written using the public FileSystem API.

Adding a trash can isn't going to magically resolve space issues.  It will primarily permit
folks who accidentally delete things using the command line to recover their files.  With
a per-user trash can, folks can easily monitor their trash usage manually if they like, and
admins can email users whose trash is large, or even empty it for them.  The cleanup thread
is an added feature that reduces the need for manual monitoring.

> support undelete, snapshots, or other mechanism to recover lost files
> ---------------------------------------------------------------------
>                 Key: HADOOP-432
>                 URL: http://issues.apache.org/jira/browse/HADOOP-432
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Yoram Arnon
>         Assigned To: Wendy Chien
>         Attachments: undelete12.patch, undelete16.patch, undelete17.patch
> currently, once you delete a file it's gone forever.
> most file systems allow some form of recovery of deleted files.
> a simple solution would be an 'undelete' command.
> a more comprehensive solution would include snapshots, manual and automatic, with scheduling

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message