hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs
Date Tue, 09 Apr 2013 17:56:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626881#comment-13626881
] 

Suresh Srinivas commented on HDFS-4489:
---------------------------------------

bq. The GSet used for InodeID to INode map is also semi-fixed. Is it allocated similarly to
BlocksMap?
Yes. Please see the patch in HDFS-4434. About 1% of heap is used for the GSet.

bq. Simply saying the overhead is insignificant won't convince users. We should explain why
the benefit from having this feature justifies the overhead. I don't think on/off switch is
necessary.
I think the assertion here is not overhead is insignificant. Depending on details of how the
namespace of a system is laid out, I would think this would be anywhere from 2 to 5%.

As far the benefits, in the main description I laid this out:

---
This helps in several use cases:
# HDFS can evolve to support ID based protocols such as NFS. We plan to add an experimental
NFS V3 gateway to HDFS using this mechanism. Will post a github link soon.
# InodeID can be used by the tools to track a single instance of a file, for cacheing data
or tracking and checking for modification based on INodeID, in tools like distcp.
# Path cannot identify a unique instance of a file. This causes issues as described in HDFS-4258
and HDFS-4437. It has also been a requirement of many other jiras such as HDFS-385.
# Using InodeID as an identifier instead of path can be more efficient than path bases accesses.
---

bq. We have a namenode which will not work well if we upgrade to a release with this feature
since it will need extra 4-6GB for the steady-state operation. Even if it could absorb the
extra memory requirement, we would have to tell users that the namespace limit is X% worse.
Is this because namenode does not have RAM? With this change, it is expected that NN is allocated
more memory, say 5%. If this is done I am not sure why users should be told namespace limit
is X% worse?

My rationale, repeating what I said earlier is,  machines are becoming available with more
RAM. Adding 5% JVM heap should not be a problem. In fact most of the namenodes are configured
with enough head room already and might not even need a change. But if this is a big concern,
I am okay making additional change to bring down the memory consumption close to zero. 


                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple folds. Here
are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been replaced or
renamed to, the file name and size combination is no t reliable, but the combination of file
id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message