hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3370) HDFS hardlink
Date Tue, 08 May 2012 20:03:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270780#comment-13270780
] 

Daryn Sharp commented on HDFS-3370:
-----------------------------------

I'm glad you find my questions helpful!

bq. For example, "ln /root/dir1/file1 /root/dir1/file2" : there is no need to increase the
ds quota usage when creating the link file: file2.  Also "rm /root/dir1/file1" : there is
no need to decrease the ds quota usage when removing the original source file: file1.

I agree that ds quota doesn't need to be changed when there are links in the same directory.
 I'm referring to the case of hardlinks across directories.  Ie. /dir/dir2/file and /dir/dir3/hardlink.
 If dir2 and dir3 have separate ds quotas, then dir3 has to absorb the ds quota when the original
file is removed from dir2.  What if there is a /dir/dir4/hardlink2?  Does dir3 or dir4 absorb
the ds quota?  What if neither has the necessary quota available?

bq.  Currently, at least for V1, we shall support the hardlinking only for the closed files
and won't support to append operation against linked files, but it could be extended in the
future.

A reasonable approach, but it may lead to user confusion.  It almost begs for a immutable
flag (ie. chattr +i/-i) to prevent inadvertent hard linking to files intended to be mutable.

Nonetheless, I'd suggest exploring the difficulties reconciling the current design of the
namesystem/block management with your design.  It may help avoid boxing ourselves into a corner
with limited hard link support.

bq.  From my understanding, the setReplication is just a memory footprint update and the name
node will increase actual replication in the background.

Yes, but the FsShell setrep command actively monitors the files and does not exit until the
replication factor is what the user requested -- as determined by the number of hosts per
block.  Another consideration is ds quota is based on a multiple of replication factor, so
who is allowed to change the replication factor since increasing it may impact a different
user's quota?
                
> HDFS hardlink
> -------------
>
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLinks.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files to share
data without copying. Currently we will support hardlinking only closed files, but it could
be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are primarily used in
facebook:
> 1. This provides a lightweight way for applications like hbase to create a snapshot;
> 2. This also allows an application like Hive to move a table to a different directory
without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message