hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hangjun Ye (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
Date Wed, 28 May 2014 03:53:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010734#comment-14010734

Hangjun Ye commented on HDFS-6382:

Implementing it outside NN is definitely another option, and I agree with Colin that it's
not feasible to implement a complex clean up policy (like based on storage space) inside NN.

TTL is a very simple (but general) policy and we might even consider it as an attribute of
file, like the number of replicas. Seems it wouldn't introduce much complexity to handle it
in the NN.

Another benefit to having it inside NN is we don't have to handle the authentication/authorization
problem in a separate system. For example we have a shared HDFS cluster for many internal
users, we don't want someone to set TTL policy to other one's files. NN could handle it easily
by its own authentication/authorization mechanism.

So far a TTL-based clean up policy is good enough for our scenario (Zesheng and I are from
the same company and we are supporting our company's internal usage for Hadoop) and it's would
be nice to have a simple and workable solution in HDFS.

> HDFS File/Directory TTL
> -----------------------
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
> In production environment, we always have scenario like this, we want to backup files
on hdfs for some time and then hope to delete these files automatically. For example, we keep
only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's
logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which
are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs
can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is
> 3. If a TTL is set on a directory, the child files and directories will be deleted automatically
after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent directory's
> 5. A global configuration is needed to configure that whether the deleted files/directories
should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory with TTL should
be deleted when it is emptied by TTL mechanism or not.

This message was sent by Atlassian JIRA

View raw message