hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4523) Fix concat for snapshots
Date Mon, 25 Feb 2013 21:22:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586299#comment-13586299
] 

Aaron T. Myers commented on HDFS-4523:
--------------------------------------

I really disagree with this premise. One of the main motivations of supporting snapshots in
HDFS is to allow the admin to roll back to the previous FS state in the case of mistakes.
Imagine for a moment that a large number of errant concat operations are performed on source
files which were in a snapshot. The HDFS admin realizes this and then wants to restore the
FS to its previous state prior to the concat operations having been performed. The admin won't
be able to do so, however, because the concat operations will have removed the source files
from the snapshots. This is incongruous with the nature of all the other FS operations, where
an action taken in the present file system will have no affect on the files in previously-created
snapshots.

I realize that concat is a bit of an odd and little-used FS operation, but I still see no
reason that it should be treated fundamentally differently from the other FS operations.
                
> Fix concat for snapshots
> ------------------------
>
>                 Key: HDFS-4523
>                 URL: https://issues.apache.org/jira/browse/HDFS-4523
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: h4523_20130222.patch, h4523_20130223.patch, h4523_20130225.patch
>
>
> The use case of concat is for copying large files across clusters using the following
steps.
> - Step 1: The blocks of a file in the source cluster are copied in parallel to transient
files in the destination cluster.
> - Step 2: Then the transient files in the destination cluster are concatenated in order
to obtain the original file.
> If a snapshot is taken in the destination cluster before Step 2, some transient files
may be captured in the snapshot.  These transient files should be removed in Step 2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message