hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7878) API - expose an unique file identifier
Date Tue, 20 Sep 2016 20:05:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507647#comment-15507647

Steve Loughran commented on HDFS-7878:


* S3a  adds an {{S3AFileStatus}} with a new field "isEmptyDir". I've contemplated making it
a subclass of {{LocatedFileStatus}}. so lists operations returning those wouldn't need any
new objects.
* grepping the Hadoop production code, FileStatus.getPath is often used as the input to FS
operations (open, delete, rename). But not so often for create(), and rename destinations.
* Any changes to FileStatus has to be done so that external filesystems (e.g. google cloud
storage) which subclass FileStatus don't break. I know, given the pain Guice causes us it'd
be retaliation, but the GCS team aren't the guice team, and would upset users.

Now, the bad news. FileStatus is part of the public FS API, documented in {{FileSystem.md}}.
You're proposing changing it, aren't you? 

Which means you get to update the doc and the tests in {{AbstractContractGetFileStatusTest}}.
And now that I know that JIRA is planning to change the file, I'm going to be expecting to
see that handles.

> API - expose an unique file identifier
> --------------------------------------
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.03.patch, HDFS-7878.04.patch,
HDFS-7878.05.patch, HDFS-7878.06.patch, HDFS-7878.patch
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA
it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be derived from
block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct when file
is overwritten.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message