hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-432) support undelete, snapshots, or other mechanism to recover lost files
Date Fri, 15 Dec 2006 23:23:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-432?page=comments#action_12458970 ] 
Yoram Arnon commented on HADOOP-432:

* I just ran a 'time hadoop dfs -lsr / > /dev/null' on our dfs from a client; it took 3:38
minutes real time, 1:40 minutes user+system time on the client, consuming 20-30% cpu on both
client and namenode. Depending on the number of files deleted, doing this every few minutes
is expensive. For comparison, 'time hadoop dfs -du /' takes 2 seconds (<1 second user+system),
so the cost of delivering the paths from the namenode to the client is the expensive part,
and internal implementation in the namenode is cheap.
I repeated this locally on the namenode, where dfs -lsr took 3:45/0:45 minutes, so the network
is not all to blame.

* data in the trash is arranged in its original folder layout, to enable a person to locate
her files and restore them. Creating a folder for every X minutes (how many?) will make restoring
a file harder.

* an external process reclaiming space needs to be monitored, otherwise files will accumulate
in the trash and the dfs will fill up. This could be achieved by a cron job, but then the
admin is required to do one extra step to set up dfs, or the namenode could fork off a clean-up

I'm 80% for the performance and simplicity an internal thread, 20% for the safety of an external

What do others think?

> support undelete, snapshots, or other mechanism to recover lost files
> ---------------------------------------------------------------------
>                 Key: HADOOP-432
>                 URL: http://issues.apache.org/jira/browse/HADOOP-432
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Yoram Arnon
>         Assigned To: Wendy Chien
>         Attachments: undelete12.patch, undelete16.patch, undelete17.patch
> currently, once you delete a file it's gone forever.
> most file systems allow some form of recovery of deleted files.
> a simple solution would be an 'undelete' command.
> a more comprehensive solution would include snapshots, manual and automatic, with scheduling

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message