hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rand <stevenjr...@gmail.com>
Subject best practices for backing up HDFS metadata?
Date Fri, 17 Jun 2016 18:45:07 GMT
Hi all,

I'm wondering what the best practices are for backing up HDFS metadata,
i.e., the data inside the directories specified by dfs.namenode.name.dir.
I'd like to be able to recover from both the loss of all of those
directories and from formatting of the NameNode. I'm interested in backing

   - Either the in-memory FSImage, or the most recent on-disk FSImage plus
   the edit log for all subsequent transactions
   - The last seen transaction file
   - The VERSION file
   - Optionally a checksum of the FSImage that I'm backing up (optionally
   because I can generate that myself after the fact)

It seems like there are basically two options:

   - Simply make a copy of one of the directories in dfs.namenode.name.dir
   on the active NameNode while it is running. This way I would get the most
   recent on-disk FSImage, the edit logs after that FSImage, the last seen
   transaction file, the VERSION file, and the checksum.
   - Checkpoint the active NameNode, either using a Secondary NameNode, a
   Checkpoint Node, a Backup Node, a standby NameNode in an HA configuration,
   or just by putting the active NameNode in safemode and running hdfs
   dfsadmin -saveNamespace. Then grab the up to date on-disk FSImage, along
   with the last seen transaction file, the VERSION file, and optionally the
   checksum of the FSImage.

Is one option any better than the other? The second option seems cleaner,
in that I don't have to worry about backing up any edit logs. However, it
also requires either putting the NameNode in safemode to checkpoint, or
deploying another service to do the checkpointing.

The first option feels a bit strange in that I could be taking a backup in
the middle of an application writing to HDFS; however, that seems to be
true of the second option as well.

I'm curious as to how people think about backing up HDFS metadata, and what
best practices the community have developed over time. Any thoughts are
much appreciated.


View raw message