hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3370) HDFS hardlink
Date Mon, 18 Jun 2012 22:10:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396306#comment-13396306

Konstantin Shvachko commented on HDFS-3370:

> the key question: What services should a file system provide?

Exactly so. I would clarify it as: What functions should be a part of the file system API
and what should be a library function.

> The same argument could be made for symbolic links. The application could implement those
(in fact it's quite simple).

"Simple" is the key point here. Simple functions should be fs APIs. Hard functions should
go into libraries.

Darin, you are right there is a lot of overlap, and yes hardlinks simplify building snapshots,
but you are just pushing the complexity on HDFS layer. This does not change the difficulty
of the problem.

We relaxed posix semantics in many aspects in HDFS for simplicity and performance. Imagine
how much easier life would be with random writes or multiple writers. You are not asking for
it, right?

Hardlinks are of similar nature. They are hard to support if the namespace is distributed.
They should not be HDFS API, but they could be a library function.
> HDFS hardlink
> -------------
>                 Key: HDFS-3370
>                 URL: https://issues.apache.org/jira/browse/HDFS-3370
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Hairong Kuang
>            Assignee: Liyin Tang
>         Attachments: HDFS-HardLink.pdf
> We'd like to add a new feature hardlink to HDFS that allows harlinked files to share
data without copying. Currently we will support hardlinking only closed files, but it could
be extended to unclosed files as well.
> Among many potential use cases of the feature, the following two are primarily used in
> 1. This provides a lightweight way for applications like hbase to create a snapshot;
> 2. This also allows an application like Hive to move a table to a different directory
without breaking current running hive queries.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message