hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs
Date Wed, 10 Apr 2013 20:17:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628197#comment-13628197
] 

Suresh Srinivas commented on HDFS-4489:
---------------------------------------

bq. But if you approach from the view point of owners of existing hardware that was spec'ed
to hold certain size of namespace, it can be viewed as a decrease of capacity.
Again I do not believe anyone runs with NN very tightly configured given the nature garbage
collection. That said, to make further progress, the following optimizations can be done:

# Initialize the map only when this feature is enabled. Should take away roughly 1/3 of extra
memory.
# Reuse existing bits in INodeId - https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12618468&commentId=13508432.
Should take away roughly 1/3 of extra memory.
# Use first block ID of the file (after ensuring even empty file has an associated block)
as the InodeID. This is very ugly and mixing two abstractions that should not be mixed. I
am reluctant to make this optimization.

My vote is to keep the code simple, abstractions clean. If folks think the above optimizations
is worth pursuing, I will update the patch.
                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple folds. Here
are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been replaced or
renamed to, the file name and size combination is no t reliable, but the combination of file
id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message