hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lohit Vijayarenu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
Date Mon, 16 Dec 2013 17:15:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849350#comment-13849350
] 

Lohit Vijayarenu commented on HDFS-5442:
----------------------------------------

Thanks for sharing the design document. This looks to be very good start.
Few initial comments. It might be good to break up the work into two major features. 
1. BlockAllocation policy for cross datacenter (which I understand is synchronous replication
from design document)
2. Asynchronous replication
This would give flexibility for users to chose either of the features based on their use case
and infrastructure support. 
Few more very high level comments

- There seems to be assumption of replication of entire namespace at few places. This might
not be desirable in many cases. Enabling this feature per directory or list of directories
would be very useful. 
- There seems to be assumption of primary cluster and secondary cluster. Can this be chained
to having something A->B and B->C. Or even the use case of A->B or B->A. Calling
out those with configuration options would be very useful for cluster admins. 
- Another place which would need more information is about primary cluster NN tracking datanode
information from secondary cluster (via secondary cluster NN). This needs to be thought to
see if this is really scalable. This I assume would mean DataNode would have globally unique
identifies now. How are failures of DataNodes handles and communicated back to Primary NN.
How are DataNodes allocated for reads. How is space accounted for within clusters. Unique
block ids across different clusters and such. Having more details on them will be very useful.
- Minor: It might be worth changing Primary/Secondary to Source/Destination cluster. It is
little confusing when also thinking about Primary/Secondary NameNodes in same document.
- Adding few cases of failure and recover would be useful. For example in synchronous replication,
what happens when secondary cluster is slow or down. How would data be re-replicated.
- How would ReplicationManager or changing replication of files work in general with this
policy?

> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>         Attachments: Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware failures within
a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is
not designed for nor deployed in configurations spanning multiple datacenters, replicating
data from one location to another is common practice for disaster recovery and global service
availability. There are current solutions available for batch replication using data copy/export
tools. However, while providing some backup capability for HDFS data, they do not provide
the capability to recover all your HDFS data from a datacenter failure and be up and running
again with a fully operational Hadoop cluster in another datacenter in a matter of minutes.
For disaster recovery from a datacenter failure, we should provide a fully distributed, zero
data loss, low latency, high throughput and secure HDFS data replication solution for multiple
datacenter setup.
> Design and code for Phase-1 to follow soon.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message