hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1869) access times of HDFS files
Date Mon, 25 Aug 2008 10:23:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dhruba borthakur updated HADOOP-1869:
-------------------------------------

    Attachment: accessTime4.patch

Incorporated most review comments. I do not update the in-memory access time every time. The
in-memory access time is in sync with the value persisted on disk. Otherwise, the access time
of a file could move back in time when a namenode restarts!

I also ran benchmarks with NNThroughputBenchmark. All benchmarks remain at practically the
same performance. In particular, the "open benchmark with 300 threads and 100K files" is as
follows:

patch              trunk
----------------------------
59916               59865 ops/sec
59171               59191 ops/sec


> access times of HDFS files
> --------------------------
>
>                 Key: HADOOP-1869
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1869
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: accessTime1.patch, accessTime4.patch
>
>
> HDFS should support some type of statistics that allows an administrator to determine
when a file was last accessed. 
> Since HDFS does not have quotas yet, it is likely that users keep on accumulating files
in their home directories without much regard to the amount of space they are occupying. This
causes memory-related problems with the namenode.
> Access times are costly to maintain. AFS does not maintain access times. I thind DCE-DFS
does maintain access times with a coarse granularity.
> One proposal for HDFS would be to implement something like an "access bit". 
> 1. This access-bit is set when a file is accessed. If the access bit is already set,
then this call does not result in a transaction.
> 2. A FileSystem.clearAccessBits() indicates that the access bits of all files need to
be cleared.
> An administrator can effectively use the above mechanism (maybe a daily cron job) to
determine files that are recently used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message