hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-231) Rsync like way of retrieving data from the dfs
Date Sun, 17 Jul 2011 19:19:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066716#comment-13066716
] 

Harsh J commented on HDFS-231:
------------------------------

I'd say that the HDFS is such a system that you do not really require backups as long as you
have active node monitoring; and for a few files its alright if you do it non-incrementally.

For instance, you can also look at dates.

> Rsync like way of retrieving data from the dfs
> ----------------------------------------------
>
>                 Key: HDFS-231
>                 URL: https://issues.apache.org/jira/browse/HDFS-231
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Johan Oskarsson
>            Assignee: Sameer Paranjpye
>
> As the dfs in our cluster contains a lot of important data, being able to retrieve them
to a non dfs backup node is essential.
> However, a lot of the files don't change inbetween backups, so a way to get only the
files that have changed would be preferable.
> Since the blocks themselves already have a crc calculated half the job is already done,
if it's possible to split the destination files in similar blocks and calculate the crc for
them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message