Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE0529148 for ; Tue, 8 May 2012 20:04:12 +0000 (UTC) Received: (qmail 16861 invoked by uid 500); 8 May 2012 20:04:12 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 16816 invoked by uid 500); 8 May 2012 20:04:12 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 16807 invoked by uid 99); 8 May 2012 20:04:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 20:04:12 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 May 2012 20:04:09 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 89D8843A625 for ; Tue, 8 May 2012 20:03:48 +0000 (UTC) Date: Tue, 8 May 2012 20:03:48 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <481315054.40762.1336507428566.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2068361121.27846.1336151688971.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HDFS-3370) HDFS hardlink MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270780#comment-13270780 ] Daryn Sharp commented on HDFS-3370: ----------------------------------- I'm glad you find my questions helpful! bq. For example, "ln /root/dir1/file1 /root/dir1/file2" : there is no need to increase the ds quota usage when creating the link file: file2. Also "rm /root/dir1/file1" : there is no need to decrease the ds quota usage when removing the original source file: file1. I agree that ds quota doesn't need to be changed when there are links in the same directory. I'm referring to the case of hardlinks across directories. Ie. /dir/dir2/file and /dir/dir3/hardlink. If dir2 and dir3 have separate ds quotas, then dir3 has to absorb the ds quota when the original file is removed from dir2. What if there is a /dir/dir4/hardlink2? Does dir3 or dir4 absorb the ds quota? What if neither has the necessary quota available? bq. Currently, at least for V1, we shall support the hardlinking only for the closed files and won't support to append operation against linked files, but it could be extended in the future. A reasonable approach, but it may lead to user confusion. It almost begs for a immutable flag (ie. chattr +i/-i) to prevent inadvertent hard linking to files intended to be mutable. Nonetheless, I'd suggest exploring the difficulties reconciling the current design of the namesystem/block management with your design. It may help avoid boxing ourselves into a corner with limited hard link support. bq. From my understanding, the setReplication is just a memory footprint update and the name node will increase actual replication in the background. Yes, but the FsShell setrep command actively monitors the files and does not exit until the replication factor is what the user requested -- as determined by the number of hosts per block. Another consideration is ds quota is based on a multiple of replication factor, so who is allowed to change the replication factor since increasing it may impact a different user's quota? > HDFS hardlink > ------------- > > Key: HDFS-3370 > URL: https://issues.apache.org/jira/browse/HDFS-3370 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Hairong Kuang > Assignee: Liyin Tang > Attachments: HDFS-HardLinks.pdf > > > We'd like to add a new feature hardlink to HDFS that allows harlinked files to share data without copying. Currently we will support hardlinking only closed files, but it could be extended to unclosed files as well. > Among many potential use cases of the feature, the following two are primarily used in facebook: > 1. This provides a lightweight way for applications like hbase to create a snapshot; > 2. This also allows an application like Hive to move a table to a different directory without breaking current running hive queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira