hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Cox <zcox...@gmail.com>
Subject How to restart an HDFS standby namenode dead for a very long time
Date Wed, 13 Jul 2016 22:04:11 GMT
Hi - we have an HDFS (version 2.0.0-cdh4.4.0) cluster setup in HA with 2
namenodes and 5 journal nodes. This cluster has been somewhat neglected
(long story) and the standby namenode process has been dead for several
months.

Recently we tried to just start the standby namenode process again, but
several hours later the entire HDFS cluster (and HBase on top of it) was
unavailable for several hours. As soon as we stopped the standby namenode
process, HDFS (and HBase) started working fine again. I don't know for
sure, but I'm guessing the standby namenode was trying to catch up on
several months of edits from being down for so long, and just couldn't do
it.

We really need to get this standby namenode process started again, so I'm
trying to find the right way to do it. I've tried starting it with the
-bootstrapStandby option, but that appears broken in our HDFS version.
Instead, we can manually rsync the files in the dfs.name.dir from the
active namenode.

I guess my question is: is there a recommended way to get this standby
namenode resurrected successfully? And would we need to do anything other
than rsync dfs.name.dir from the active namenode before starting the
standby namenode again?

Thanks,
Zach

Mime
View raw message