Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 729E810CAF for ; Thu, 29 May 2014 15:21:02 +0000 (UTC) Received: (qmail 38130 invoked by uid 500); 29 May 2014 15:21:02 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 38079 invoked by uid 500); 29 May 2014 15:21:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 38070 invoked by uid 99); 29 May 2014 15:21:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 May 2014 15:21:02 +0000 Date: Thu, 29 May 2014 15:21:02 +0000 (UTC) From: "Chris Nauroth (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-6382) HDFS File/Directory TTL MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012442#comment-14012442 ] Chris Nauroth commented on HDFS-6382: ------------------------------------- bq. The implemented mechanism inside the NameNode would (maybe periodically) execute all policies specified by users, and it would do it as a superuser safely, as authentication/authorization have been done when user set their policies to the NameNode. This logic is subject to time of check/time of use race conditions, possibly resulting in incorrect deletion of data. For example, imagine the following sequence: # A user calls the setttl command on /file1. Authentication is successful, and the authenticated user is the file owner, so NN decides the user is authorized to set a TTL. # An admin changes the owner of /file1 in order to revoke the user's access. # Now the NN's background expiration thread/job starts running. It finds a TTL on /file1 and deletes it. Since this is running as the HDFS super-user, nothing blocks the delete, even though the user who set the TTL really no longer has permission to delete. With an external process, authentication and authorization are enforced at the time of delete for the specific user, so there is no time of check/time of use race condition, and there is no chance of an incorrect delete. Running some code as a privileged user might look expedient in some ways, but it also compromises the file system permissions model somewhat. > HDFS File/Directory TTL > ----------------------- > > Key: HDFS-6382 > URL: https://issues.apache.org/jira/browse/HDFS-6382 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, namenode > Affects Versions: 2.4.0 > Reporter: Zesheng Wu > Assignee: Zesheng Wu > > In production environment, we always have scenario like this, we want to backup files on hdfs for some time and then hope to delete these files automatically. For example, we keep only 1 day's logs on local disk due to limited disk space, but we need to keep about 1 month's logs in order to debug program bugs, so we keep all the logs on hdfs and delete logs which are older than 1 month. This is a typical scenario of HDFS TTL. So here we propose that hdfs can support TTL. > Following are some details of this proposal: > 1. HDFS can support TTL on a specified file or directory > 2. If a TTL is set on a file, the file will be deleted automatically after the TTL is expired > 3. If a TTL is set on a directory, the child files and directories will be deleted automatically after the TTL is expired > 4. The child file/directory's TTL configuration should override its parent directory's > 5. A global configuration is needed to configure that whether the deleted files/directories should go to the trash or not > 6. A global configuration is needed to configure that whether a directory with TTL should be deleted when it is emptied by TTL mechanism or not. -- This message was sent by Atlassian JIRA (v6.2#6252)