Hi Tariq,
Thanks for your patient. I know that fsimage stores metadata of blocks. I have three machine to back it, so I don't worry about it lost. I'm using SNN and NFS to backup NN data file. But as the description above, my damaged data dirtied every nodes that I backed up automatically.

BTW: you looks like the actor of PI on the movie "lifes of PI":)
Best regards,
Andy Zhou

2012/12/20 Mohammad Tariq <dontariq@gmail.com>
Hello Andy,

            NN stores all the metadata in a file called as "fsimage". The fsimage file contains a snapshot of the HDFS metadata. Along with fsimage NN also holds  "edit log" files. Whenever there is a change to HDFS, it gets appended to the edits file. When these log files grow big, they are merged together with fsimage file. These files are stored on the local FS at the path specified by the "dfs.name.dir" property in "hdfs-site.xml" file. To prevent any loss you can give multiple locations as the value for this property, say 1 on your local disk and another on a network drive in case you HD get crashed you still have the metadata safe with you in that network drive.(The condition which you have faced recently)

Now, coming to the SNN. It is a helper node for the NN. SNN periodically pulls the fsimage file, which would have grown quite big by now. And the NN starts the cycle again. Suppose, you are ruuning completely out of luck and loose the entire NN. In such a case you can take his copy of fsimage from the SNN and retrieve your metadata back.


On Thu, Dec 20, 2012 at 3:18 PM, 周梦想 <ablozhou@gmail.com> wrote:
Some reasons lead to my name node data error, but the error data also overwrite the second name node data, also the NFS backup. I want to recover the name node data a day ago or even a week ago,but I can't. I have to back up name node data manually or write a bash script to backup it? why  hadoop does not give a configure to   backup name node data to local disk daily or  hourly with different time stamp name? 

The same question is to HBase's .META. and -ROOT- table. I think it's history storage is more important 100  times than the log history.

I think it could be implemented in Second Name Node/Check Points Node or Back Node. Now I do this just using bash script.

Some one agree with me? 

Best Regards,
Andy Zhou