hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-1869) access times of HDFS files
Date Thu, 28 Aug 2008 21:07:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626354#action_12626354
] 

shv edited comment on HADOOP-1869 at 8/28/08 2:07 PM:
----------------------------------------------------------------------

I don't understand. There must be some secret use case that you don't want to talk about or
something. I have all those questions
- Why do we need to be able to setAccessTime() to Au 31, 1991 or Jul 15, 2036?
- Why do we need setAccessTime() but never needed setModificationTime()?
- My understanding is that access time main use case is for ops to be able to recognize files
that have not been used for the last 6 month and remove them on the bases they are old. So
a user can loose files if he by mistake sets aTime to e.g. 1b.c. Or alternately a user can
set aTime to files 1 year in advance and that will keep ops from removing them for the next
1.5 years.
- More. Local file system, KFS, S3 all will not support setAccessTime(), but HDFS will. Is
it right to make it a generic FileSystem interface?
- Pointing to utime() is the same as pointing to FSNamesystem.unprotectedSetAccessTime().
I still cannot change aTime or mTime using bash.

I guess I am saying I am ok with a touchAC() method (as in {{touch -ac}}), but it is already
there, called getBlockLocations(), and I don't see why you need more.
The rest looks great.

      was (Author: shv):
    I don't understand. There must be some secret use case that you don't want to talk about
or something. I have all those questions
- Why do we need to be able to setAccessTime() to Au 31, 1991 or Jul 15, 2036?
- Why do we need setAccessTime() but never needed setModificationTime()?
- My understanding is that access time main use case is for ops to be able to recognize files
that have not been used for the last 6 month and remove them on the bases they are old. So
a user can loose files if he by mistake sets aTime to e.g. 1b.c. Or alternately a user can
set aTime to files 1 year in advance and that will keep ops from removing them for the next
1.5 years.
- More. Local file system, KFS, S3 all will not support setAccessTime(), but HDFS will. Is
it right to make it a generic FileSystem interface?
- Pointing to utime() is the same as pointing to FSNamesystem.unprotectedSetAccessTime().
I still cannot change aTime or mTime using bash.
I guess I am saying I am ok with a touch() method, but it is already there, called getBlockLocations(),
and I don't see why you need more.
The rest looks great.
  
> access times of HDFS files
> --------------------------
>
>                 Key: HADOOP-1869
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1869
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.19.0
>
>         Attachments: accessTime1.patch, accessTime4.patch, accessTime5.patch
>
>
> HDFS should support some type of statistics that allows an administrator to determine
when a file was last accessed. 
> Since HDFS does not have quotas yet, it is likely that users keep on accumulating files
in their home directories without much regard to the amount of space they are occupying. This
causes memory-related problems with the namenode.
> Access times are costly to maintain. AFS does not maintain access times. I thind DCE-DFS
does maintain access times with a coarse granularity.
> One proposal for HDFS would be to implement something like an "access bit". 
> 1. This access-bit is set when a file is accessed. If the access bit is already set,
then this call does not result in a transaction.
> 2. A FileSystem.clearAccessBits() indicates that the access bits of all files need to
be cleared.
> An administrator can effectively use the above mechanism (maybe a daily cron job) to
determine files that are recently used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message