hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brahma Reddy Battula <brahmareddy.batt...@huawei.com>
Subject RE: How to restart an HDFS standby namenode dead for a very long time
Date Fri, 15 Jul 2016 07:21:07 GMT
Seems to be you are hitting following jira.. Please refer

https://issues.apache.org/jira/browse/HDFS-9917




--Brahma Reddy Battula

From: Zach Cox [mailto:zcox522@gmail.com]
Sent: 14 July 2016 03:34
To: user@hadoop.apache.org
Subject: How to restart an HDFS standby namenode dead for a very long time

Hi - we have an HDFS (version 2.0.0-cdh4.4.0) cluster setup in HA with 2 namenodes and 5 journal
nodes. This cluster has been somewhat neglected (long story) and the standby namenode process
has been dead for several months.

Recently we tried to just start the standby namenode process again, but several hours later
the entire HDFS cluster (and HBase on top of it) was unavailable for several hours. As soon
as we stopped the standby namenode process, HDFS (and HBase) started working fine again. I
don't know for sure, but I'm guessing the standby namenode was trying to catch up on several
months of edits from being down for so long, and just couldn't do it.

We really need to get this standby namenode process started again, so I'm trying to find the
right way to do it. I've tried starting it with the -bootstrapStandby option, but that appears
broken in our HDFS version. Instead, we can manually rsync the files in the dfs.name.dir from
the active namenode.

I guess my question is: is there a recommended way to get this standby namenode resurrected
successfully? And would we need to do anything other than rsync dfs.name.dir from the active
namenode before starting the standby namenode again?

Thanks,
Zach

Mime
View raw message