hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <guy.doulb...@conduit.com>
Subject NameNode - didn't persist the edit log
Date Thu, 15 Dec 2011 09:16:33 GMT
Hi guys,

We recently had the following problem  on our production cluster:

The filesystem containing the editlog and fsimage had no free inodes.
  As a result the namenode wasn't able to obtain an inode for the 
fsimage and  editlog after a checkpiot has been reached, while the 
previous files were freed.
  Unfortunately, we had no monitoring on the inodes number, so it 
happens that the namenode ran in this state for a few hours.

We have noticed this failure in its DFS-status page.

But the namenode didn't enter safe-mode, so all the writes were made 
couldn't be persisted to the editlog.

After discovering the problem we freed inodes, and the file-system 
seemed to be okay again, we tried to force the namenode to persist to 
editlog with no success,

Eventually, we restarted the namenode -which of-course caused us to lose 
all the data that was written to the hdfs during these few hours 
(fortunately we have backup of the recent writes - so we restored the 
data from there )

This situation raises some severe concerns,
1. How come the namenode identified  a failure in persisting its editlog 
and didn't enter safe-mode? (The exception was given only a WARN 
-severity and not a CRITICAL)
2. How come after we freed  inodes, we couldn't persist the namenode? 
Maybe there should be a command in the CLI to should enable us to force 
the namenode to persist its editlog

Do you know of a JIRA opened for these issue, or should I open one?

Thanks Guy

View raw message