hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4529) Decide the semantic of concat with snapshots
Date Tue, 26 Feb 2013 03:14:16 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586680#comment-13586680

Jing Zhao commented on HDFS-4529:

bq. There is no additional complexity because we already have to deal with the issue of two
INodes sharing the same block(s). Any two versions of the same file INode, snapshotted at
different times, will share the same blocks, regardless of how we implement the concat operation.

This is not the case after we have developed diff list in the file level to get rid of the
links between file inodes sharing the same blocks. If we want to handle multiple inodes sharing
the same blocks again, we may again have to create special class like INodeFileWithLink and
link multiple inodes in a circular linked list (or have other solutions to identify all the
inodes sharing the same blocks). 

> Decide the semantic of concat with snapshots
> --------------------------------------------
>                 Key: HDFS-4529
>                 URL: https://issues.apache.org/jira/browse/HDFS-4529
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
> The use case of concat is for copying large files across clusters using the following
> - Step 1: The blocks of a file in the source cluster are copied in parallel to transient
files in the destination cluster.
> - Step 2: Then the transient files in the destination cluster are concatenated in order
to obtain the original file.
> If a snapshot is taken in the destination cluster before Step 2, some transient files
may be captured in the snapshot.  Then what should happen?  The following are some alternatives:
> * (1) fail concat and keep the transient files in the snapshots;
> * (2) allow concat and keep the transient files in the snapshots;
> * (3) allow concat but remove the transient files from all snapshots.
> All solutions above are not perfect.  Here are their drawbacks:
> For (1) and (2), the transient files will remain in the system until the snapshots are
deleted.  It is inefficient to the system since the files are known to be transient.  (1)
may be able to force user to create files under some non-snapshottable tmp directory in the
first place.  However, it complicates the user applications and the existing applications
may need to be updated for the new policy.  Also, non-snapshottable directory may not exists
since admin may set the system root directory to be snapshottable.  For (2), the problem seems
to break the Read-Only snapshot contract - some files appear in a snapshot may disappear later

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message