hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liyin Tang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3370) HDFS hardlink
Date Tue, 08 May 2012 20:47:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270805#comment-13270805
] 

Liyin Tang commented on HDFS-3370:
----------------------------------

bg. I agree that ds quota doesn't need to be changed when there are links in the same directory.
I'm referring to the case of hardlinks across directories. Ie. /dir/dir2/file and /dir/dir3/hardlink.
If dir2 and dir3 have separate ds quotas, then dir3 has to absorb the ds quota when the original
file is removed from dir2. What if there is a /dir/dir4/hardlink2? Does dir3 or dir4 absorb
the ds quota? What if neither has the necessary quota available?

Based on the same example you commented, when linking /dir/dir2/file and /dir/dir3/hardlink,
it will increase the dsquota for dir3 but not /dir. Because dir3 is NOT a common ancestor
but dir is. And if dir3 doesn't have enough dsquota, then it shall throw quota exceptions.
Also if there is a /dir/dir4/hardlink2, it absorbs the dsquota for dir4 as well. So the point
is that it only absorbs the dsquota during the link creation time and decreases the dsquota
during the link deletion time.


>From my understanding, the basic semantics for HardLink is to allow user create multiple
logic files referencing to the same set of blocks/bytes on disks. So user could set different
file level attributes for each linked file such as owner, permission, modification time. 
Since these linked files share the same set of blocks, the block level setting shall be shared.

It may be a little confused to distinguish the replication factor in HDFS between file-level
attributes and block-level attributes. 
If we agree that replication factor is a block-level attribute, then we shall pay the overhead
(wait time) when increasing replication factor, just as increasing the replication factor
against a regular file, and the setReplication operation is supposed to fail if it breaks
the dsquota.

                
> HDFS hardlink
> -------------
>
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLinks.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files to share
data without copying. Currently we will support hardlinking only closed files, but it could
be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are primarily used in
facebook:
> 1. This provides a lightweight way for applications like hbase to create a snapshot;
> 2. This also allows an application like Hive to move a table to a different directory
without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message