Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 23 Dec 2013 03:32:04 +0000 (UTC)
From: "Jerry Chen (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12676326.1383052308490.2288.1387769524179@arcas>
In-Reply-To: <JIRA.12676326.1383052308490@arcas>
References: <JIRA.12676326.1383052308490@arcas>
Subject: [jira] [Updated] (HDFS-5442) Zero loss HDFS data replication for
 multiple datacenters
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jerry Chen updated HDFS-5442:
-----------------------------

    Attachment: Disaster Recovery Solution for Hadoop.pdf

Thanks you all for the comments.
We improved the design in journaling.  The key improved concepts are as following:
1. Use a plugin JournalManager which writes directly to the Active NameNode of the mirror cluster for sync namespace journaling.
2. The Active NameNode of mirror cluster tails from primary cluster Shared Journal for apply for async namespace journaling.
3. As usual, the Active NameNode of mirror cluster will always write its Shared Journal locally. (This is hold true for both sync and async namespace journaling)
4. The Standby NamNode of mirror cluster will always tails from its local Shared Journal. (This is hold true for both sync and async namespace journaling.)

With the new approach, we achieve:
a. A clearer workflow and concept for cross datacenter replication.
b. The internal structure and relationship of a HDFS cluster is kept with maximum.
c. Avoiding cross site communications due to multi edit transfers to Journal Nodes.
d. The role of Active NameNode in the mirror cluster will be clearer too.
e. The functionality of Standby NameNode will no longer impact by this feataure.
f. Potentially simplify the checkpoint thing.

The lastest document is updated. It is updated also with the concept of mirror cluster instead of Secondary cluster which was pointed out confusion.

> Zero loss HDFS data replication for multiple datacenters
> --------------------------------------------------------
>
>                 Key: HDFS-5442
>                 URL: https://issues.apache.org/jira/browse/HDFS-5442
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Avik Dey
>         Attachments: Disaster Recovery Solution for Hadoop.pdf, Disaster Recovery Solution for Hadoop.pdf
>
>
> Hadoop is architected to operate efficiently at scale for normal hardware failures within a datacenter. Hadoop is not designed today to handle datacenter failures. Although HDFS is not designed for nor deployed in configurations spanning multiple datacenters, replicating data from one location to another is common practice for disaster recovery and global service availability. There are current solutions available for batch replication using data copy/export tools. However, while providing some backup capability for HDFS data, they do not provide the capability to recover all your HDFS data from a datacenter failure and be up and running again with a fully operational Hadoop cluster in another datacenter in a matter of minutes. For disaster recovery from a datacenter failure, we should provide a fully distributed, zero data loss, low latency, high throughput and secure HDFS data replication solution for multiple datacenter setup.
> Design and code for Phase-1 to follow soon.


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)