hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5442) Zero loss HDFS data replication for multiple datacenters
Date Thu, 04 Sep 2014 09:51:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121182#comment-14121182

Hari Sekhon commented on HDFS-5442:

Some thoughts I had on this:

1. Having multiple NameNodes at each DC is non-optional

2. WanDisco does multiple NameNodes at each DC with majority global quorum with a weighting
priority for the primary DC.

3. While zero replication is great, a single global namespace spanning multiple datacenters
would be awesome. One HDFS namespace globally with snapshots in case anyone deletes anything,
with data available for local read processing at all sites.

4. Path-level replication control should be included to be able to exclude /tmp and similar
scratch directories to reduce Wan replication traffic

5. Async cross DC should be the default as otherwise performance will drop through the floor.

6. Should have tunable consistency (think Cassandra), where you can choose to have it confirm
replication to 1 or more nodes at other DC before confirming write.

7. Should have bandwidth control to not flood the wan link.

8. Must have a monitoring interface to determine the state of replication lag of secondary

9. Should support multiple datacenters using global quorum and replication of blocks to all

10. Should replicate each unique block only once to a datanode at other DC, which in turn
replicates to other datanodes inside same DC, minimizing wan traffic and relying on the in-built
HDFS checksums.

10. Should support chain linking replication paths so 2nd DC replicates to 3rd DC instead
of primary DC having to send the same data twice through it's wan link to both of them.

> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>            Assignee: Dian Fu
>         Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution
for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf
> Hadoop is architected to operate efficiently at scale for normal hardware failures within
a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is
not designed for nor deployed in configurations spanning multiple datacenters, replicating
data from one location to another is common practice for disaster recovery and global service
availability. There are current solutions available for batch replication using data copy/export
tools. However, while providing some backup capability for HDFS data, they do not provide
the capability to recover all your HDFS data from a datacenter failure and be up and running
again with a fully operational Hadoop cluster in another datacenter in a matter of minutes.
For disaster recovery from a datacenter failure, we should provide a fully distributed, zero
data loss, low latency, high throughput and secure HDFS data replication solution for multiple
datacenter setup.
> Design and code for Phase-1 to follow soon.

This message was sent by Atlassian JIRA

View raw message