hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Kumar <skbrnwl-...@yahoo.com.INVALID>
Subject Re: Reconfigured Namenode stuck in safemode
Date Tue, 03 May 2016 22:29:09 GMT
Hello All,


We've been experimenting with namenode re-configuration in zookeeper based HA configuration.
We've been able to automate the set up part using bigtop scripts. We're trying to use the
same setup scripts for re-configuration should one of the namenodes die. I found that these
scripts wait for namenode to exit safe mode by issuing following command:

hdfs dfsadmin -safemode wait
This works fine for the initial cluster setup part however for re-configuration, the new namenode
occassionally gets stuck in safemode forever. For easier understanding lets say we launched
a cluster with 5 hosts, nn1 and nn2 were running fine and at some point of time we replace
nn1 with nn3 (a completely new host). For this replacement to take affect we change configurations
on all the hosts to point to nn3 and restart hadoop daemons there. We're seeing that nn1 comes
back online fine but nn3 remains stuck in safemode forever. 
Running hdfs dfsadmin -safemode get, shows exactly the same: nn1 is fine (out of safemode)
and nn3 in safemode. If i run 

hdfs dfsadmin -safemode leave on nn3, it would leave safemode immediately and doing ls, cp,
mv on hdfs would work just fine. We've been debating if this is an expected behavior and whether
we should do one of the following:   
   - don't do safemode wait for reconfiguration
   - set dfs.namenode.safemode.threshold-pct to 0 for reconfiguration so namenode would check-in
immediately.

Seems like we're doing something suspicious here. I did read about hdfs edit logs, would nn3
be syncing all the hdfs edits from nn2 as it comes up? Do i need to worry about this for reconfiguration?
Any recommendations on what logs we should look at or whether this approach seems good to
automate? Would really appreciate any feedback.
Thanks,-Sumit


  
Mime
View raw message