hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs
Date Thu, 25 Apr 2013 22:14:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642290#comment-13642290

Sanjay Radia commented on HDFS-4489:

Nathan. A question.
Suresh is willing to do the performance benchmark, but I am trying to understand where you
are coming from.  Yahoo and FB create very large namespaces by simply buying more memory and
increasing the size of the heap. Do you worry about cache pollution when you create 50K more
files? Given that the NN heap (many GBs) is so much larger than the cache, does the additional
inode and inode-map size impact the overall system performance? Suresh has argued that a 24GB
heap grows by 625MB. Looking at the growth in memory of this feature as a percentage of the
total heap size is a more realistic way of looking at the impact of the growth than the growth
of an individual data structure like the inode.

IMHO, not having an inode-map and inode number was a serious limitation in the original implementation
of NN. I am willing to pay for the extra memory given the value inode-id and inode-map brings
(as described by suresh in the beginning of this Jira). Permissions, access time, etc   added
to the memory cost of the the NN and were accepted because of the value they bring. 

> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>             Fix For: 2.0.5-beta
> The benefit of using InodeID to uniquely identify a file can be multiple folds. Here
are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been replaced or
renamed to, the file name and size combination is no t reliable, but the combination of file
id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of filename (HDFS-385).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message