hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-1144) Improve Trash Emptier
Date Tue, 11 May 2010 18:55:41 GMT
Improve Trash Emptier

                 Key: HDFS-1144
                 URL: https://issues.apache.org/jira/browse/HDFS-1144
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Dmytro Molkov
            Assignee: Dmytro Molkov

There are two inefficiencies in the Trash functionality right now that have caused some problems
for us.

First if you configured your trash interval to be one day (24 hours) that means that you store
2 days worth of data eventually. The Current and the previous timestamp that will not be deleted
until the end of the interval.
And another problem is accumulating a lot of data in Trash before the Emptier wakes up. If
there are a couple of million files trashed and the Emptier does deletion the NameNode will
freeze until everything is removed. (this particular problem hopefully will be addressed with

My proposal is to have two configuration intervals. One for deleting the trashed data and
another for checkpointing. This way for example for intervals of one day and one hour we will
only store 25 hours of data instead of 48 right now and the deletions will be happening in
smaller chunks every hour of the day instead of a huge deletion at the end of the day now.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message