hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hangjun Ye (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
Date Thu, 05 Jun 2014 06:05:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018506#comment-14018506

Hangjun Ye commented on HDFS-6382:

Thanks Colin. We would start to draft a design doc and ask you guys' help to review.

Yes, the xattrs has saved the big burden for saving the policy, the major question left is
where to run the logic.

Besides these 3 options, another related stuff might be the "trash". Currently trash is implemented
as a client-side capability, the trash cleanup logic (trash emptier) depends on FileSystem
to operate namespace and basically is a client-side function. But the trash emptier runs *inside*
NN as a daemon thread, instead of a separate daemon process. I guess it interacts with NN
via RPC even it runs inside NN.

We could observe some similarities of trash, balancer, and the proposed TTL: mainly need data
from NN; could be implemented as client-side capability (via RPC); need to be run periodically.
So if possible we unify all these stuff in one framework/daemon? It also echos Haohui's points
earlier. And if it's implemented clearly enough, the user could optionally run it inside NN
as a daemon thread to have less jobs to maintain, as long as the user would like to take the
risk of running additional logic inside NN (w/o changing NN's logic for this, as it still
interacts with NN like a client).

That's just a premature idea, we might still want to have the TTL as a separate daemon firstly
as it's most straight forward. Let's discuss more after we have the design doc.

> HDFS File/Directory TTL
> -----------------------
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
> In production environment, we always have scenario like this, we want to backup files
on hdfs for some time and then hope to delete these files automatically. For example, we keep
only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's
logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which
are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs
can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is
> 3. If a TTL is set on a directory, the child files and directories will be deleted automatically
after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent directory's
> 5. A global configuration is needed to configure that whether the deleted files/directories
should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory with TTL should
be deleted when it is emptied by TTL mechanism or not.

This message was sent by Atlassian JIRA

View raw message