hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3370) HDFS hardlink
Date Thu, 14 Jun 2012 07:34:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294861#comment-13294861

M. C. Srivas commented on HDFS-3370:

@Karthik:  using hard-links for backup accomplishes exactly the opposite. The expectation
with a correctly-implemented hardlink is that when the original is modified, the change is
reflected in the file, no matter which path-name was used to access it. Isn't that exactly
the opposite effect of what a backup/snapshot is supposed to do?  Unless of course you are
committing to never ever being able to modify a file once written (although that would be
viewed by most as a major step backwards in the evolution of Hadoop).

Another major problem is the scalability of the NN gets reduced by a factor of 10.  (ie, your
cluster can now hold only 10 million files instead of the 100 million which it used to be
able to hold).  Imagine someone doing a backup every 6 hours. Let's say the backups are to
be retained as follows:  4 for the past 24 hrs, 1 daily for a week, and 1 per week for 1 month.
Total: 4 + 7 + 4 = 15 backups, ie, 15 hard-links to the files, one from each backup. So each
file is pointed to by 15 names, or, in another words, the NN now holds 15 names instead of
1 for each file.  I think that would reduce the number of files held by the cluster practically
speaking by a factor of 10, no?

Thirdly, hard-links don't work with directories. What is the scheme to back up directories?
 (If this scheme only usable for HBase backups and nothing else, then I agree with Konstantin
that it belongs in the HBase layer and not here)

> HDFS hardlink
> -------------
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLink.pdf
> We'd like to add a new feature hardlink to HDFS that allows harlinked files to share
data without copying. Currently we will support hardlinking only closed files, but it could
be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are primarily used in
> 1. This provides a lightweight way for applications like hbase to create a snapshot;
> 2. This also allows an application like Hive to move a table to a different directory
without breaking current running hive queries.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message