hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs
Date Wed, 27 Mar 2013 00:19:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614721#comment-13614721
] 

Suresh Srinivas commented on HDFS-4489:
---------------------------------------

bq. for a total of 40 bytes on a 64-bit JVM. So, adding 16-24 bytes is a pretty substantial
new memory use.
Here are the things that goes into ~180 bytes:
INode is an object. It comes with the cost of 16 bytes object header overhead. Members include:
# byte[] name - I assume typically ~56 bytes for this. That is (16 bytes object overhead,
8 byte length + bytes that make up file name, say 32)
# reference to byte[] name - 8 bytes
# long permission at the cost of 8 bytes.
# parent reference at 8 bytes cost
# modification time at 8 bytes cost
# accessTime at 8 bytes cost

That is roughly ~112 bytes.

Typically most of the INodes are INode files (I will leave the other type of inodes as an
exercise).
# It has BlockInfo[]. This is again 16 bytes of object, 8 bytes length, say two blocks in
a file with two references, with a cost of 40 bytes.
# It has long header that adds another 8 bytes.

Total ~160 bytes. So it is not very far off and the number I had posted was based on what
I had calculated long back.

That said, 16-24 might seem like a huge percentage (10 to 15%) of INode size. But what is
the amount of memory in NN heap that is allocate for Inodes. Assuming Inodes make up for 1/3,
blocks make up for another 1/3, remaining 1/3 for floating garbage, head room etc, the net
impact on NN heap is 3 to 5%. That is not far off from the analysis posted above.

I believe half of the work is already in trunk. Remaining two jiras need to go in. I believe
doing a branch at this point in time is unnecessary work.

If you are concerned about memory usage of your installs, I can add a config option and not
instantiate the map. 



                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple folds. Here
are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been replaced or
renamed to, the file name and size combination is no t reliable, but the combination of file
id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message