Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 88762 invoked from network); 18 Dec 2006 19:49:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Dec 2006 19:49:45 -0000 Received: (qmail 91050 invoked by uid 500); 18 Dec 2006 19:49:52 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 91022 invoked by uid 500); 18 Dec 2006 19:49:52 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 91006 invoked by uid 99); 18 Dec 2006 19:49:52 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Dec 2006 11:49:52 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Dec 2006 11:49:43 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 3BAD5714289 for ; Mon, 18 Dec 2006 11:49:23 -0800 (PST) Message-ID: <17547304.1166471363241.JavaMail.jira@brutus> Date: Mon, 18 Dec 2006 11:49:23 -0800 (PST) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-432) support undelete, snapshots, or other mechanism to recover lost files In-Reply-To: <13436198.1155057493889.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/HADOOP-432?page=comments#action_12459441 ] Doug Cutting commented on HADOOP-432: ------------------------------------- > doing this every few minutes is expensive Yes, walking the entire trash bucket too frequently would be a problem. So we can either walk it less frequently and/or don't walk the whole thing. I've proposed bucketing the trash into 10-or-more minute sub-directories, so that only the root trash directory need be listed, and even that should only be listed every 30 or more minutes. > Creating a folder for every X minutes (how many?) will make restoring a file harder. But, with globbing, it won't be too hard. The primary point is not to make restoring files ultra-simple but rather to make it possible. > an external process reclaiming space needs to be monitored, otherwise files will accumulate in the trash and the dfs will fill up If the trash is full then folks can empty the trash. I'm not arguing that we shouldn't start a thread in the namenode that empties the trash, just that this thread should be reusable code, written using the public FileSystem API. Adding a trash can isn't going to magically resolve space issues. It will primarily permit folks who accidentally delete things using the command line to recover their files. With a per-user trash can, folks can easily monitor their trash usage manually if they like, and admins can email users whose trash is large, or even empty it for them. The cleanup thread is an added feature that reduces the need for manual monitoring. > support undelete, snapshots, or other mechanism to recover lost files > --------------------------------------------------------------------- > > Key: HADOOP-432 > URL: http://issues.apache.org/jira/browse/HADOOP-432 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: Yoram Arnon > Assigned To: Wendy Chien > Attachments: undelete12.patch, undelete16.patch, undelete17.patch > > > currently, once you delete a file it's gone forever. > most file systems allow some form of recovery of deleted files. > a simple solution would be an 'undelete' command. > a more comprehensive solution would include snapshots, manual and automatic, with scheduling options. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira