hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anu Engineer <aengin...@hortonworks.com>
Subject Re: Safe mode on after restarting hadoop
Date Thu, 22 Dec 2016 16:55:44 GMT
Hi Chathuri,

This means that NN has not heard about all the blocks it is supposed to hear from the datanodes.
Since all the datanodes are functional, here are some things to check.

1.  Is there any volume loss on data nodes?

2. You mentioned that you had a failure in Namenode, are you sure that the Namenode metadata
was not affected in any way – for example you might have accidently copied an older snapshot
of Namenode.

This warning is should go away once all 10 data nodes have reported in.

Leaving Safe mode by itself is not going to cause a data corruption, but HDFS is trying to
tell you about a problem, so it would be better to investigate it rather than just ignore


From: Chathuri Wimalasena <kamalasini@gmail.com>
Date: Thursday, December 22, 2016 at 8:02 AM
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: Safe mode on after restarting hadoop


We have a hadoop cluster with 10 data nodes. We had a disk failure with the login node where
the namenode, secondary namenode running and replaced the failed disk. Failed disk does not
affect the data, it only affected the operating system. After replacing the failed disk, when
I restart the hadoop services, hadoop is set to safe mode and does not let run jobs. Below
message shows in namenode UI.

Safe mode is ON. The reported blocks 391253 needs additional 412776 blocks to reach the threshold
0.9990 of total blocks 804833. The number of live datanodes 10 has reached the minimum number
0. Safe mode will be turned off automatically once the thresholds have been reached.

I can see all the data nodes are up and running. Also when I check for corrupt blocks, it
shows as 0.

hdfs fsck / -list-corruptfileblocks
Connecting to namenode via http://ln02:50070/fsck?ugi=hadoop&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

Any idea what's going on ? I can forcefully leave the safemode, but I'm worried whether it
might cause data corruption. Are there any safety steps I should do before leave the safemode
forcefully ?


View raw message