hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7878) API - expose an unique file identifier
Date Mon, 09 Mar 2015 22:18:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353753#comment-14353753

Colin Patrick McCabe commented on HDFS-7878:

Thanks for tackling this!

One big concern here, though: I'm concerned that having a separate getFileId call will create
a lot of TOCTOU (time-of-check, time-of-use) race conditions.  What if the file is deleted
and re-created in between calling getFileStatus and calling getFileId?  Then the client ends
up caching the wrong file block locations (or other file data).  -1 until we can figure this
out.  Sorry for the negativity.

Why can't we just put this into the file info somewhere?  I don't think the subclass approach
is a bad one.  To avoid casting, we could also have an accessor in the superclass that returns
0 (or throws an exception) when the ID is not available.

> API - expose an unique file identifier
> --------------------------------------
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by the JIRA
it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be derived from
block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct when file
is overwritten.

This message was sent by Atlassian JIRA

View raw message