hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
Date Wed, 04 Jun 2014 18:45:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017992#comment-14017992

Colin Patrick McCabe commented on HDFS-6382:

The xattrs branch was merged to trunk two weeks ago.  Since trunk is where development happens
anyway, you should be able to start now if you like.

Maybe post a design doc first if you want feedback.  It seems like the big question to be
answered is: where is this going to live?  We have had proposals for doing this as an MR job,
a separate daemon, or part of the balancer.  They all have pros and cons... it would be good
to write down the benefits and disadvantages of each option before making a choice.

I think any of these 3 options is possible and I wouldn't vote against any of them.  It's
up to you.  If it's a separate daemon, at minimum, we can put it in contrib/.  But you may
find that some options have a higher maintenance burden on you.  I also think that users don't
like running more daemons if they can help it.  But perhaps there is something I haven't thought
of that makes a separate daemon a good choice.

> HDFS File/Directory TTL
> -----------------------
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
> In production environment, we always have scenario like this, we want to backup files
on hdfs for some time and then hope to delete these files automatically. For example, we keep
only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's
logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which
are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs
can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is
> 3. If a TTL is set on a directory, the child files and directories will be deleted automatically
after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent directory's
> 5. A global configuration is needed to configure that whether the deleted files/directories
should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory with TTL should
be deleted when it is emptied by TTL mechanism or not.

This message was sent by Atlassian JIRA

View raw message