hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-432) support undelete, snapshots, or other mechanism to recover lost files
Date Fri, 22 Dec 2006 02:55:24 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-432?page=comments#action_12460373 ] 
Yoram Arnon commented on HADOOP-432:

> bucketing the trash into 10-or-more minute sub-directories...with globbing, it won't
be too hard

True, but without sub-directories it will be even easier. And with the DFS GUI, globbing doesn't
help, you're left to drill down-down-down, then up-up-up to find your files. If you're unlucky,
your files, deleted seconds apart, are in two directories, and very deep.
And besides, *nobody* else does it this way. Windows shows a flat view of deleted files, not
sure about the Mac.
If we're going to reinvent the wheel, let's do things better than others, not worse.

> If the trash is full then folks can empty the trash

While there are no permissions in HDFS yet, there will be permissions in the medium future,
and then people won't be able to delete each other's files. That will leave the administrator
with the task of cleaning out the trash if it's large, which makes it a non-starter in my
A per-user trash can is interesting, but still some thread needs to clean up the trash in
the background, and needs to be fair in that it removes the globally oldest file deleted first.
The intent was for files to stay around in case of accidental deletion, definitely not to
require a second operation in order to really delete a file. Without a background thread or
some other process that automatically removes old deleted files I'd vote against a trash altogether
- it would just make the system less usable.

> this thread should be reusable code, written using the public FileSystem API

Well, that would require the FileSystem API to support something like 'find /trash -mmin +n',
or "ls -r | head -<n>", implemented internally in the namenode for performance reasons.
I'm not strictly against this, but it seems like an unnecessary complication at this time.
The implementation doesn't preclude exporting this API in the future.

IMO, the approach taken by the proposed patch is the right one. It resembles other trashes,
it's a reasonably round wheel.

> support undelete, snapshots, or other mechanism to recover lost files
> ---------------------------------------------------------------------
>                 Key: HADOOP-432
>                 URL: http://issues.apache.org/jira/browse/HADOOP-432
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Yoram Arnon
>         Assigned To: Wendy Chien
>         Attachments: undelete12.patch, undelete16.patch, undelete17.patch
> currently, once you delete a file it's gone forever.
> most file systems allow some form of recovery of deleted files.
> a simple solution would be an 'undelete' command.
> a more comprehensive solution would include snapshots, manual and automatic, with scheduling

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message