hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zesheng Wu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6382) HDFS File/Directory TTL
Date Thu, 12 Jun 2014 02:57:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028768#comment-14028768
] 

Zesheng Wu commented on HDFS-6382:
----------------------------------

[~szetszwo], Thanks for your valuable suggestions.
bq. Using xattrs for TTL is a good idea. Do we really need ttl in milliseconds? Do you think
that the daemon could guarantee such accuracy? We don't want to waste namenode memory space
to store trailing zeros/digits for each ttl. How about supporting symbolic ttl notation, e.g.
10h, 5d?
Yes, I agree with you that the daemon can't guarantee milliseconds accuracy, and in fact there's
no need to guarantee such accuracy. As you suggested, we can use encoded bytes to save NN's
memory.

bq. The name "Supervisor" sounds too general. How about calling it "TtlManager" for the moment?
If there are more new features added to the tool, we may change the name later.
OK, "TtlManager" is more suitable for the moment.

bq. For setting ttl on a directory foo, write permission permission on the parent directory
of foo is not enough. Namenode also checks rwx for all subdirectories of foo for recursive
delete. 
Nice catch, If we want to conform to the delete semantics mentioned by Colin, we should check
the subdirectories recursively.

bq. BTW, permission could be changed from time to time. A user may be able to delete a file/dir
at the time of setting TTL but the same user may not have permission to delete the same file/dir
when the ttl expires.
The deleting work will be done by a super user(which the "TtlManager" runs as), seems this
is not a problem?

bq. I suggest not to check additional permission requirement on setting ttl but run as the
particular user when deleting the file. Then we need to add username to the ttl xattr.
Good point, but adding the username to the ttl xattr requires more space of NN's memory, we
should do the trade-off whether it's worth doing.


> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>         Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to backup files
on hdfs for some time and then hope to delete these files automatically. For example, we keep
only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's
logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which
are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs
can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is
expired
> 3. If a TTL is set on a directory, the child files and directories will be deleted automatically
after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent directory's
> 5. A global configuration is needed to configure that whether the deleted files/directories
should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory with TTL should
be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message