hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7878) API - expose an unique file identifier
Date Mon, 16 Mar 2015 22:12:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364067#comment-14364067

Jing Zhao commented on HDFS-7878:

bq. Colin wrote:  if the client makes two different calls to getFileStatus... since we're
doing 2x the RPCs to the NameNode that we need to...

The current patch only makes one getFileStatus RPC.

bq. Colin wrote: Then FileStatus objects returned from HDFS (and any other filesystem that
has user-visible inode IDs) can return the inode ID

This FileStatus object returned from HDFS is called "HdfsFileStatus"...

bq. Colin wrote: We do 1/2 the RPCs of the current patch, put 1/2 the load on the NN, and
don't open up another race condition.

Again, the current patch only makes ONE single RPC, and your proposed approach is exactly
what the current patch is doing except we already have HdfsFileStatus containing file ID information.
Instead of changing or extending a public and stable interface like FileStatus, the easiest
way is to only keep this API inside DistributedFileSystem and simply returning the file id
contained inside of HdfsFileStatus.

> API - expose an unique file identifier
> --------------------------------------
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA
it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be derived from
block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct when file
is overwritten.

This message was sent by Atlassian JIRA

View raw message