hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1377) Creation time and modification time for hadoop files and directories
Date Wed, 13 Jun 2007 21:01:47 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504421

Konstantin Shvachko commented on HADOOP-1377:

- Please avoid introducing methods with deprecated UTF8. String would be a better in this
  public DFSFileInfo getFileInfo(UTF8 src) throws IOException {

- all imports are redundant
- I think that in terms of different file system compatibility FileStatus should 
be an interface. For the hdfs we should have a class HDFSFileStatus, which should be a part
of DFSFileInfo combining all status fields. That way we will not need to change
protocols and internal name-node interfaces when we add/modify status fields.
- In the FileStatus constructor you are assigning blockSize to itself.
    this.blockSize = blockSize;
I guess a parameter is missing.

- I am not sure whether HADOOP-1377 should be built on top of HADOOP-1298 or 
vise versa, but I agree with Doug that public api should use FileStatus.
That is why DfsPath should introduce getFileStatus() rather than getters for each new field.

- Rather than multiplying parameters for each method related to meta-data modification
I would just add FileStatus as a parameter once.
-  getFileInfo should not be public and should not have UTF8 as a parameter
public DFSFileInfo getFileInfo(UTF8 src) throws IOException {...}
Same for FSNamesystem.getFileInfo().

- loadFSEdits() has a lot of code replication, which deserves to be wrapped in
separate method(s). I'd serialize entire HDFSFileStatus, which is Writable anyway.
Same for FSImage, I'd serialize the entire FileStatus.

> Creation time and modification time for hadoop files and directories
> --------------------------------------------------------------------
>                 Key: HADOOP-1377
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1377
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: CreationModificationTime.html, CreationTime.patch
> This issue will document the requirements, design and implementation of creation times
and modification times of hadoop files and directories.
> My proposal is to have support two additional attributes for each file and directory
in HDFS. The "creation time" is the time when the file/directory was created. It is a 8 byte
integer stored in each FSDirectory.INode. The "modification time" is the time when the last
modification occured to the file/directory. It is an 8 byte integer stored in the FSDirectory.INode.
These two fields are stored in in the FSEdits and FSImage as part of the transaction that
created the file/directory.
> My current proposal is to not support "access time" for a file/directory. It is costly
to implement and current applications might not need it.
> In the current implementation, the "modification time" for a file will be same as its
creation time because HDFS files are currently unmodifiable. Setting file attributes (e.g.
setting the replication factor) of a file does not modify the "modification time" of that
file. The "modification time" for a directory is either its creation time or the time when
the most recent file-delete or file-create occured in that directory.
> A new command named "hadoop dfs -lsl" will display the creation time and modification
time of the files/directories that it lists. The output of the existing command "hadoop dfs
-ls" will not be affected.
> The ClientProtocol will change because DFSFileInfo will have two additional fields: the
creation time and modification time of the file that it represents. This information can be
retrieved by clients thorugh the ClientProtocol.getListings() method. The FileSystem public
API will have two additional methods: getCreationTime and getModificationTime().
> The datanodes are completely transparent to this design and implementation and requires
no change.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message