hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian Fang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1623) High Availability Framework for HDFS NN
Date Mon, 29 Jun 2015 17:26:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605934#comment-14605934

Jian Fang commented on HDFS-1623:

Could someone please response on this issue? The new name node on a replacement is critical
for auto provisioning a hadoop cluster with HDFS HA support in cloud. Without this support,
the HA feature could not really be used. I also observed that the new standby name node on
the replacement instance could stuck in safe mode  because no data nodes check in with it.
Even with a rolling restart, it may take quite some time to restart all data nodes if we have
a big cluster, for example, with 4000 data nodes, let alone restarting DN is way too intrusive
and it is not a preferred operation in production. It also increases the chance for a double
failure because the standby name node is not really ready for a failover in the case that
the current active name node fails. This is really a big issue. 

Please at least provide us some pointers on why it is difficult to support adding a new standby
to a running DN and what we need to pay attention if we need to implement this by ourselves.

Thanks again.

> High Availability Framework for HDFS NN
> ---------------------------------------
>                 Key: HDFS-1623
>                 URL: https://issues.apache.org/jira/browse/HDFS-1623
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Sanjay Radia
>             Fix For: 2.0.0-alpha
>         Attachments: HA-tests.pdf, HDFS-1623.rel23.patch, HDFS-1623.trunk.patch, HDFS-High-Availability.pdf,
NameNode HA_v2.pdf, NameNode HA_v2_1.pdf, Namenode HA Framework.pdf, dfsio-results.tsv, ha-testplan.pdf,

This message was sent by Atlassian JIRA

View raw message