hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1869) access times of HDFS files
Date Mon, 10 Sep 2007 19:19:29 GMT
access times of HDFS files

                 Key: HADOOP-1869
                 URL: https://issues.apache.org/jira/browse/HADOOP-1869
             Project: Hadoop
          Issue Type: New Feature
          Components: dfs
            Reporter: dhruba borthakur

HDFS should support some type of statistics that allows an administrator to determine when
a file was last accessed. 

Since HDFS does not have quotas yet, it is likely that users keep on accumulating files in
their home directories without much regard to the amount of space they are occupying. This
causes memory-related problems with the namenode.

Access times are costly to maintain. AFS does not maintain access times. I thind DCE-DFS does
maintain access times with a coarse granularity.

One proposal for HDFS would be to implement something like an "access bit". 
1. This access-bit is set when a file is accessed. If the access bit is already set, then
this call does not result in a transaction.
2. A FileSystem.clearAccessBits() indicates that the access bits of all files need to be cleared.

An administrator can effectively use the above mechanism (maybe a daily cron job) to determine
files that are recently used.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message