Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48459905F for ; Tue, 8 May 2012 16:40:11 +0000 (UTC) Received: (qmail 78238 invoked by uid 500); 8 May 2012 16:40:11 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 78189 invoked by uid 500); 8 May 2012 16:40:11 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 78180 invoked by uid 99); 8 May 2012 16:40:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 16:40:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 16:40:09 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id AF74343A671 for ; Tue, 8 May 2012 16:39:49 +0000 (UTC) Date: Tue, 8 May 2012 16:39:49 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <646368503.39490.1336495189720.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2068361121.27846.1336151688971.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-3370) HDFS hardlink MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270589#comment-13270589 ] Daryn Sharp commented on HDFS-3370: ----------------------------------- While I really like the idea of hardlinks, I believe there are more non-trivial consideration with this proposed implementation. I'm by no means a SME, but I experimented with a very different approach awhile ago. Here are some of the issues I encountered: I think the quota considerations may be a bit trickier. The original creator of the file takes the nsquota & dsquota hit. The links take just the dsquota hit. However, when the original creator of the file is removed, one of the other links must absorb the dsquota. If there are multiple remaining links, which one takes the hit? What if none of the remaining links have available quota? If the dsquota can always be exceeded, I can bypass my quota by creating the file in one dir, hardlinking from my out-of-dsquota dir, then removing the original. If the dsquota cannot be exceeded, I can (maliciously?) hardlink from my out-of-dsquota dir to deny the original creator the ability to delete the file -- perhaps causing them to be unable to reduce their quota usage. Block management will also be impacted. The manager currently operates on an inode mapping (changing to an interface though), but which of the hardlink inodes will it be? The original? When that link is removed, how will the block manager be updated with another hardlink inode? When a file is open for writing, the inode converts to under construction, so there would need to be a hardlink under construction. You will have to think about how other hardlinks are affected/handled. The case applies to hardlinks during file creation and appending. There may also be an impact to file leases. I believe they are path based so leases will now need to be enforced across multiple paths. What if one hardlink changes the replication factor? The maximum replication factor for all hardlinks should probably be obeyed, but now the setrep command will never succeed since it waits for the replication value to actually change. > HDFS hardlink > ------------- > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Hairong Kuang > Assignee: Liyin Tang > Attachments: HDFS-HardLinks.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files to share data without copying. Currently we will support hardlinking only closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a snapshot; > 2. This also allows an application like Hive to move a table to a different directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira