Hi, Team,
What we found when we use the Hadoop is, the FSImage often currupts when we
do start/stop the Hadoop cluster. The reason we think might be around the
write to the outputstream: the NameNode may be killed when it saveNamespace,
then the FsImage file doesn't complete writing. Currently i saw a
previous.checkpoint folder, the logic of saveNamespace is like:
1. mv the current folder to the previous.checkpoint folder.
2. start to write the FSImage into the current folder.
I think there mightbe a case if the FSImage is currupted, the NameNode can
NOT be started, but we can NOT get any EOFException, since we might
encounter the OutofMemory exception if we read the wrong numBlocks and
instantiate the Blocks [] blocks = new Blocks[numBlocks] (actually, we face
this issue).
Any suggestion to it?
thanks
macf
|