hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8693) refreshNamenodes does not support adding a new standby to a running DN
Date Mon, 06 Jul 2015 15:59:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615227#comment-14615227
] 

Kihwal Lee commented on HDFS-8693:
----------------------------------

I do agree that {{refreshNameNodes}} needs to be fixed. This command does not work for federated
HA clusters. Also, if one service actor thread shuts down in HA, there is no way to start
it up again without restarting the datanode. The datanode should shutdown in this case, or
{{refreshNamenodes}} should be fixed to work with HA.

> refreshNamenodes does not support adding a new standby to a running DN
> ----------------------------------------------------------------------
>
>                 Key: HDFS-8693
>                 URL: https://issues.apache.org/jira/browse/HDFS-8693
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, ha
>    Affects Versions: 2.6.0
>            Reporter: Jian Fang
>            Priority: Critical
>
> I tried to run the following command on a Hadoop 2.6.0 cluster with HA support 
> $ hdfs dfsadmin -refreshNamenodes datanode-host:port
> to refresh name nodes on data nodes after I replaced one name node with a new one so
that I don't need to restart the data nodes. However, I got the following error:
> refreshNamenodes: HA does not currently support adding a new standby to a running DN.
Please do a rolling restart of DNs to reconfigure the list of NNs.
> I checked the 2.6.0 code and the error was thrown by the following code snippet, which
led me to this JIRA.
> void refreshNNList(ArrayList<InetSocketAddress> addrs) throws IOException {
> Set<InetSocketAddress> oldAddrs = Sets.newHashSet();
> for (BPServiceActor actor : bpServices)
> { oldAddrs.add(actor.getNNSocketAddress()); }
> Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs);
> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty())
> { // Keep things simple for now -- we can implement this at a later date. throw new IOException(
"HA does not currently support adding a new standby to a running DN. " + "Please do a rolling
restart of DNs to reconfigure the list of NNs."); }
> }
> Looks like this the refreshNameNodes command is an uncompleted feature. 
> Unfortunately, the new name node on a replacement is critical for auto provisioning a
hadoop cluster with HDFS HA support. Without this support, the HA feature could not really
be used. I also observed that the new standby name node on the replacement instance could
stuck in safe mode because no data nodes check in with it. Even with a rolling restart, it
may take quite some time to restart all data nodes if we have a big cluster, for example,
with 4000 data nodes, let alone restarting DN is way too intrusive and it is not a preferable
operation in production. It also increases the chance for a double failure because the standby
name node is not really ready for a failover in the case that the current active name node
fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message