Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Tue, 26 Mar 2013 23:49:15 +0000 (UTC)
From: "Todd Lipcon (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12631775.1360608342450.59990.1364341755822@arcas>
In-Reply-To: <JIRA.12631775.1360608342450@arcas>
References: <JIRA.12631775.1360608342450@arcas>
Subject: [jira] [Commented] (HDFS-4489) Use InodeID as as an identifier of a
 file in HDFS protocols and APIs
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614692#comment-13614692 ] 

Todd Lipcon commented on HDFS-4489:
-----------------------------------

bq. Inode size is ~180 bytes and this proposal adds 16-24 bytes per Inode.

How is this calculated? I see the following 5 fields:

{code}
  private byte[] name = null;
  private long permission = 0L;
  protected INodeDirectory parent = null;
  protected long modificationTime = 0L;
  protected long accessTime = 0L;
{code}

for a total of 40 bytes on a 64-bit JVM. So, adding 16-24 bytes is a pretty substantial new memory use.

I agree with ATM that this should go on a branch since it's fairly invasive. Once the branch is working, we can evaluate the benefit of the new feature vs the measured cost (both memory and additional CPU to manage this new structure)
                
> Use InodeID as as an identifier of a file in HDFS protocols and APIs
> --------------------------------------------------------------------
>
>                 Key: HDFS-4489
>                 URL: https://issues.apache.org/jira/browse/HDFS-4489
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Brandon Li
>            Assignee: Brandon Li
>
> The benefit of using InodeID to uniquely identify a file can be multiple folds. Here are a few of them:
> 1. uniquely identify a file cross rename, related JIRAs include HDFS-4258, HDFS-4437.
> 2. modification checks in tools like distcp. Since a file could have been replaced or renamed to, the file name and size combination is no t reliable, but the combination of file id and size is unique.
> 3. id based protocol support (e.g., NFS)
> 4. to make the pluggable block placement policy use fileid instead of filename (HDFS-385).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira