hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6761) Improve Trash Emptier
Date Mon, 17 May 2010 18:22:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868332#action_12868332

Hadoop QA commented on HADOOP-6761:

-1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 944521.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 1018 javac compiler warnings (more than the trunk's
current 1017 warnings).

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/522/console

This message is automatically generated.

> Improve Trash Emptier
> ---------------------
>                 Key: HADOOP-6761
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6761
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Dmytro Molkov
>            Assignee: Dmytro Molkov
>         Attachments: HADOOP-6761.2.patch, HADOOP-6761.3.patch, HADOOP-6761.patch
> There are two inefficiencies in the Trash functionality right now that have caused some
problems for us.
> First if you configured your trash interval to be one day (24 hours) that means that
you store 2 days worth of data eventually. The Current and the previous timestamp that will
not be deleted until the end of the interval.
> And another problem is accumulating a lot of data in Trash before the Emptier wakes up.
If there are a couple of million files trashed and the Emptier does deletion on HDFS the NameNode
will freeze until everything is removed. (this particular problem hopefully will be addressed
with HDFS-1143).
> My proposal is to have two configuration intervals. One for deleting the trashed data
and another for checkpointing. This way for example for intervals of one day and one hour
we will only store 25 hours of data instead of 48 right now and the deletions will be happening
in smaller chunks every hour of the day instead of a huge deletion at the end of the day now.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message