Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 52717 invoked from network); 12 May 2010 17:29:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 May 2010 17:29:06 -0000 Received: (qmail 15373 invoked by uid 500); 12 May 2010 17:29:06 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 15342 invoked by uid 500); 12 May 2010 17:29:06 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 15334 invoked by uid 99); 12 May 2010 17:29:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 May 2010 17:29:06 +0000 X-ASF-Spam-Status: No, hits=-1418.4 required=10.0 tests=ALL_TRUSTED,AWL X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 May 2010 17:29:05 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o4CHSjCT022581 for ; Wed, 12 May 2010 17:28:45 GMT Message-ID: <19027311.581273685325104.JavaMail.jira@thor> Date: Wed, 12 May 2010 13:28:45 -0400 (EDT) From: "Doug Cutting (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-6761) Improve Trash Emptier In-Reply-To: <3590952.12841273613022929.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HADOOP-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866640#action_12866640 ] Doug Cutting commented on HADOOP-6761: -------------------------------------- > Can anyone think of a clean and nice way to override these so that we can have a quick test that tests everything? One possibility might be to interpret the configuration values as floating point doubles so that you can compatibly configure sub-minute intervals. > Improve Trash Emptier > --------------------- > > Key: HADOOP-6761 > URL: https://issues.apache.org/jira/browse/HADOOP-6761 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Dmytro Molkov > Assignee: Dmytro Molkov > Attachments: HADOOP-6761.patch > > > There are two inefficiencies in the Trash functionality right now that have caused some problems for us. > First if you configured your trash interval to be one day (24 hours) that means that you store 2 days worth of data eventually. The Current and the previous timestamp that will not be deleted until the end of the interval. > And another problem is accumulating a lot of data in Trash before the Emptier wakes up. If there are a couple of million files trashed and the Emptier does deletion on HDFS the NameNode will freeze until everything is removed. (this particular problem hopefully will be addressed with HDFS-1143). > My proposal is to have two configuration intervals. One for deleting the trashed data and another for checkpointing. This way for example for intervals of one day and one hour we will only store 25 hours of data instead of 48 right now and the deletions will be happening in smaller chunks every hour of the day instead of a huge deletion at the end of the day now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.