Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 44438 invoked from network); 10 Sep 2007 22:09:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Sep 2007 22:09:52 -0000 Received: (qmail 62865 invoked by uid 500); 10 Sep 2007 22:09:45 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 62845 invoked by uid 500); 10 Sep 2007 22:09:45 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 62833 invoked by uid 99); 10 Sep 2007 22:09:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Sep 2007 15:09:45 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Sep 2007 22:09:51 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5BEB971417D for ; Mon, 10 Sep 2007 15:09:30 -0700 (PDT) Message-ID: <32434137.1189462170367.JavaMail.jira@brutus> Date: Mon, 10 Sep 2007 15:09:30 -0700 (PDT) From: "Sameer Paranjpye (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1869) access times of HDFS files In-Reply-To: <11716700.1189451969738.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526282 ] Sameer Paranjpye commented on HADOOP-1869: ------------------------------------------ > Another option to consider is making this a separate log that's buffered This is a pretty good option. It should cost almost nothing to write the access to a buffered log. I suspect that for the use case Allen describes it would be sufficient to discover the outliers i.e. files and directories that haven't been accessed in months. > access times of HDFS files > -------------------------- > > Key: HADOOP-1869 > URL: https://issues.apache.org/jira/browse/HADOOP-1869 > Project: Hadoop > Issue Type: New Feature > Components: dfs > Reporter: dhruba borthakur > > HDFS should support some type of statistics that allows an administrator to determine when a file was last accessed. > Since HDFS does not have quotas yet, it is likely that users keep on accumulating files in their home directories without much regard to the amount of space they are occupying. This causes memory-related problems with the namenode. > Access times are costly to maintain. AFS does not maintain access times. I thind DCE-DFS does maintain access times with a coarse granularity. > One proposal for HDFS would be to implement something like an "access bit". > 1. This access-bit is set when a file is accessed. If the access bit is already set, then this call does not result in a transaction. > 2. A FileSystem.clearAccessBits() indicates that the access bits of all files need to be cleared. > An administrator can effectively use the above mechanism (maybe a daily cron job) to determine files that are recently used. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.