hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hangjun Ye (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
Date Fri, 30 May 2014 08:22:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013411#comment-14013411

Hangjun Ye commented on HDFS-6382:

I think we have two discussions here now: a TTL cleanup policy (implemented inside or outiside
NN), and a general mechanism to help implement such a policy easily inside NN.

I've been convinced that a specific TTL cleanup policy implementation does NOT sound feasible
to fly in core code of NN directly, I'm more interested to pursuing a mechanism to enable
such policy implementation.

Considering HBase having co-processor (https://blogs.apache.org/hbase/entry/coprocessor_introduction),
people could extend the functionality easily (w/o extending the base classes), such as counting
rows, secondary index. We could argue that most of such usages are NOT necessarily implemented
as server side, but having such a mechanism gives users an opportunity to choose what is most
suitable for their requirements.

If the NN has such an extensible mechanism (as Haohui suggested earlier), we could implement
a TTL cleanup policy in NN in an elegant way (w/o touching the base classes). And NN has abstracted
out the "INode.Feature", we could implement a TTLFeature to hold the meta. The policy implementation
doesn't have to go into community's codebase if it's too specific, we could keep it in our
private branch. But basing on a general mechanism (w/o touching the base classes) makes it
easy to be maintained (considering we would upgrade with new Hadoop releases regularly).

If you guys think such a general mechanism deserves to be considered, we are happy to contribute
some efforts.

> HDFS File/Directory TTL
> -----------------------
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
> In production environment, we always have scenario like this, we want to backup files
on hdfs for some time and then hope to delete these files automatically. For example, we keep
only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's
logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which
are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs
can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is
> 3. If a TTL is set on a directory, the child files and directories will be deleted automatically
after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent directory's
> 5. A global configuration is needed to configure that whether the deleted files/directories
should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory with TTL should
be deleted when it is emptied by TTL mechanism or not.

This message was sent by Atlassian JIRA

View raw message