hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file
Date Thu, 16 Jul 2009 08:03:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731849#action_12731849

dhruba borthakur commented on HDFS-487:

> So truncating a file would change the fileid?

Truncating a file does not change the fileid. There isn't an operation that can change the
fileid of an existing file. The filid is associated with a file at file creation time. If
you delete a file and then recreate a file with the same pathname, the new file will get a
new fileid. The reason I mention truncate is to exemplify the fact that the heuristic used
in "distcp -update" option might not work very well when hdfs supports truncates. "distcp
-update" could use the fileid to reduce the probability of not detecting modified files.

> I am still not clear about block placement use case.. may be it can use id of the first
block (it comes for free).

A blockid of a block is a concatenation of a 64 bit blockid and a 64 bit generation stamp.
An error while writing to a block causes the generation stamp of that block to be modified.
So, the blockid of the first block of a file does not remain fixed for the lifetime of that
file. That means, it cannot be used as an unique identifier for a file.

> (3) separation of block management.

UUIDs probably make it somewhat futureproof, but we can also upgrade the unique-within-filesystem-fileid
to a globally-unique-fileid when the use case arises. Such an upgrade will be easy to do.
(The tradeoff is using more memory in the NN)

> HDFS should expose a fileid to uniquely identify a file
> -------------------------------------------------------
>                 Key: HDFS-487
>                 URL: https://issues.apache.org/jira/browse/HDFS-487
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: fileid1.txt
> HDFS should expose a id that uniquely identifies a file. This helps in developing  applications
that work correctly even when files are moved from one directory to another. A typical use-case
is to make the Pluggable Block Placement Policy (HDFS-385) use fileid instead of filename.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message