hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a file in HDFS protocols and APIs
Date Wed, 01 May 2013 17:42:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646745#comment-13646745

Suresh Srinivas commented on HDFS-4489:

Thanks [~nroberts] and [~daryn] for commenting back.

bq. 100K files via 100 threads seems like a very small sampling when we're running namespaces
well over 100M. I think the only detail that might make performance worse is how well the
inode map performs as the bucket chains get longer. If it's a problem we can probably fix
it later.
100 threads is quite considerable and matches well with typical big cluster RPC handler count.
Also inodeMap size is created as a percentage of total memory. That means it is sized based
on the namenode size. I agree that this performance impact should be minimal and we should
be able to fix if we find any issues.

bq. I did notice that unprotectedConcat appears to leak inodes in the map - it unlinks the
concat'ed files but doesn't remove them from the map. 
Nice catch. Created HDFS-4785.

bq. ...so you may want to double check.
Yes. I will run through one more review.

bq. Might want to correct the misspelling: remvoed AllFromInodesFromMap
Will be addressed in another jira.

> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>             Fix For: 2.0.5-beta
>         Attachments: 4434.optimized.patch
> The benefit of using InodeID to uniquely identify a file can be multiple folds. Here
are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been replaced or
renamed to, the file name and size combination is no t reliable, but the combination of file
id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of filename (HDFS-385).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message