hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
Date Wed, 18 Dec 2013 01:20:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13851209#comment-13851209

Jerry Chen commented on HDFS-5442:

{quote}There are two clusters in your design document: the primary cluster and the secondary
cluster. I think we only need one cluster.{quote}
We think it is important to have clear communication and collaboration boundary between the
regions (datacenters) for the following reasons:

1.       When one datacenter fails, another datacenter should take over with a symmetric HA
cluster, rather than leave a single cluster with reduced resources. 

2.       With a single cluster approach, the impact to the existing HDFS deployment concept
is huge. A HDFS cluster is no longer one Active NameNode and Standby NameNode. It will span
multiple “regions” and with “two Standby NameNode in each region”. And also the DataNodes
are split into regions and the block locations are not shared between NameNodes of different
regions, but they belong to a single HDFS cluster. These concept changes will further impact
more on existing Hadoop operations and tooling. 

3.       With a single cluster approach, straightforwardly, operations must manage all the
nodes across datacenters. This may cause unnecessary cross site communications. And also,
it loses the flexibility of managing each datacenter separately. 

4.       We avoid larger questions such as how upper level components such as HBase and YARN
could be deployed and run over a single HDFS system across multiple sites.

And as to QJM, although it can in theory span datacenters, the peers in the backup datacenters
could easily be out of date when the primary fails. This is because only a majority needs
to agree without consideration for location. 

> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>         Attachments: Disaster Recovery Solution for Hadoop.pdf
> Hadoop is architected to operate efficiently at scale for normal hardware failures within
a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is
not designed for nor deployed in configurations spanning multiple datacenters, replicating
data from one location to another is common practice for disaster recovery and global service
availability. There are current solutions available for batch replication using data copy/export
tools. However, while providing some backup capability for HDFS data, they do not provide
the capability to recover all your HDFS data from a datacenter failure and be up and running
again with a fully operational Hadoop cluster in another datacenter in a matter of minutes.
For disaster recovery from a datacenter failure, we should provide a fully distributed, zero
data loss, low latency, high throughput and secure HDFS data replication solution for multiple
datacenter setup.
> Design and code for Phase-1 to follow soon.

This message was sent by Atlassian JIRA

View raw message