hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3370) HDFS hardlink
Date Tue, 08 May 2012 16:39:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270589#comment-13270589
] 

Daryn Sharp commented on HDFS-3370:
-----------------------------------

While I really like the idea of hardlinks, I believe there are more non-trivial consideration
with this proposed implementation.  I'm by no means a SME, but I experimented with a very
different approach awhile ago.  Here are some of the issues I encountered:

I think the quota considerations may be a bit trickier.  The original creator of the file
takes the nsquota & dsquota hit.  The links take just the dsquota hit.  However, when
the original creator of the file is removed, one of the other links must absorb the dsquota.
 If there are multiple remaining links, which one takes the hit?

What if none of the remaining links have available quota?  If the dsquota can always be exceeded,
I can bypass my quota by creating the file in one dir, hardlinking from my out-of-dsquota
dir, then removing the original.  If the dsquota cannot be exceeded, I can (maliciously?)
hardlink from my out-of-dsquota dir to deny the original creator the ability to delete the
file -- perhaps causing them to be unable to reduce their quota usage.

Block management will also be impacted.  The manager currently operates on an inode mapping
(changing to an interface though), but which of the hardlink inodes will it be?  The original?
 When that link is removed, how will the block manager be updated with another hardlink inode?

When a file is open for writing, the inode converts to under construction, so there would
need to be a hardlink under construction.  You will have to think about how other hardlinks
are affected/handled.  The case applies to hardlinks during file creation and appending.

There may also be an impact to file leases.  I believe they are path based so leases will
now need to be enforced across multiple paths.

What if one hardlink changes the replication factor?  The maximum replication factor for all
hardlinks should probably be obeyed, but now the setrep command will never succeed since it
waits for the replication value to actually change.
                
> HDFS hardlink
> -------------
>
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLinks.pdf
>
>
> We'd like to add a new feature hardlink to HDFS that allows harlinked files to share
data without copying. Currently we will support hardlinking only closed files, but it could
be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are primarily used in
facebook:
> 1. This provides a lightweight way for applications like hbase to create a snapshot;
> 2. This also allows an application like Hive to move a table to a different directory
without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message