hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7878) API - expose an unique file identifier
Date Fri, 23 Sep 2016 01:01:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514987#comment-15514987

Chris Douglas commented on HDFS-7878:

bq. I'd highlight YARN app submission as a key use case; it currently uses timestamps and
gets very confused if you overwrite something, even if its contents are unchanged
YARN {{LocalResourceProto}} could replace some of its metadata with a {{FileStatusProto}}
from HDFS-6984, if it were available. That should include this identifier.

bq. The other way to expose it would be from a byte[] of version info
An opaque {{byte[]}} is awkward with multiple args. A caller would need to pass a {{Map<Path,byte[]>}},
arrays of {{Path}} and {{byte[]}}, or a composite type (kind of redundant, given {{FileStatus}}
exists). We could add a {{FileSystem#getHandle(FileStatus)}} API to return an opaque, serializable
type that's the minimal set of bytes to refer to that entity. I suppose we could mandate {{BikeShed#getPath()}}
so it's not too annoying, but as a subset, we're saving a handful of bytes over {{FileStatus}}.
Particularly after HDFS-6984, if someone wants to store a few million of them efficiently,
there are better methods for that.

That said, I agree that {{FileStatusProto}} should represent the {{BikeShed}} as bytes.

bq. Additionally, InodeId looks implementation-specific to me, which makes this API not useful
to or be supported natively by other backend

To be clear, this supports using {{FileStatus}} in {{FileSystem}} APIs, rather than {{InodeId}}
e.g., {{FileSystem#open(InodeId)}}? Do you agree that we should use a type for {{BikeShed}},
not just a long?

> API - expose an unique file identifier
> --------------------------------------
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch,
HDFS-7878.05.patch, HDFS-7878.06.patch, HDFS-7878.patch
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA
it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be derived from
block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct when file
is overwritten.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message