hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jian Fang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN
Date Mon, 06 Jul 2015 17:40:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615361#comment-14615361
] 

Jian Fang commented on HDFS-8693:
---------------------------------

[~kihwal], thank you for your comments. Unfortunately, DNS alias does not really work or comes
with a very high maintenance cost in some scenarios when the cluster is not maintained by
system admin, but by software such as the case of auto provisioning and managing Hadoop cluster
in cloud. HDFS clients have the failover proxy configured to switch to one of the two pre-configured
name node addresses. There are cases the new name node becomes active and the HDFS clients
are broken, but we more concern about the data nodes themselves for a long running cluster.
The refreshNameNodes needs to be fixed in this case.  

> refreshNamenodes does not support adding a new standby to a running DN
> ----------------------------------------------------------------------
>
>                 Key: HDFS-8693
>                 URL: https://issues.apache.org/jira/browse/HDFS-8693
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, ha
>    Affects Versions: 2.6.0
>            Reporter: Jian Fang
>            Priority: Critical
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new one so
that I don't need to restart the data nodes. However, I got the following error:
> refreshNamenodes: HA does not currently support adding a new standby to a running DN.
Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code snippet, which
led me to this JIRA.
> void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException {
> Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. throw new IOException(
"HA does not currently support adding a new standby to a running DN. " + "Please do a rolling
restart of DNs to reconfigure the list of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto provisioning a
hadoop cluster with HDFS HA support. Without this support, the HA feature could not really
be used. I also observed that the new standby name node on the replacement instance could
stuck in safe mode because no data nodes check in with it. Even with a rolling restart, it
may take quite some time to restart all data nodes if we have a big cluster, for example,
with 4000 data nodes, let alone restarting DN is way too intrusive and it is not a preferable
operation in production. It also increases the chance for a double failure because the standby
name node is not really ready for a failover in the case that the current active name node
fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message