hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MARCOS MEDRADO RUBINELLI <marc...@buscapecompany.com>
Subject Re: Physically moving HDFS cluster to new
Date Thu, 18 Apr 2013 12:36:27 GMT
Here's a rough guideline:

Moving a cluster isn't all that different from upgrading it. The initial steps are the same:
- stop your mapreduce services
- switch you namenode to safe mode
- generate a final image with -saveNamespace
- stop your hfds services
- back up your metadata - as long as you have a copy of you metadata, there's a good chance
you can recover a cluster without data loss

Now, before you turn off and pack up your machines, it's a good idea to update your hosts,
as Bejoy describes. Assuming you do have the new IPs in advance, of course. It isn't strictly
necessary, but if your services are configured to start on a bootup, it will save you the
work of bringing them down, updating your hosts/XMLs, then bringing them up again.

Now, when the namenode starts, all it has is the metadata. It knows what files should be in
HDFS, and what blocks belong to which files. But it has no information on where it can find
those blocks. If you run a fsck, it will report back saying every file is corrupt. So don't
do it, it will just generate unnecessary panic.

When a datanode starts, it scans its data directories, and makes a list of all the blocks
it has. If you configured your cluster right, the datanode will then locate the namenode,
and pass this block report on. After a few minutes, once all your datanodes are online, your
namenode will report a full, healthy file system. You can run some sanity checks, and once
you're satisfied, start the jobtracker and tasktrackers.

Good luck!

On 18-04-2013 02:27, Bejoy Ks Wrote:
Adding on to the comments

You might need to update the etc-hosts with new values.
If the host name changes as well, you may need to update the

fs.default.name<http://fs.default.name> and mapred.job.tracker with new values.

On Thu, Apr 18, 2013 at 10:08 AM, Azuryy Yu <azuryyyu@gmail.com<mailto:azuryyyu@gmail.com>>
Data nodes name or IP  changed cannot cause your data loss. only kept fsimage(under the namenode.data.dir)
and all block data on the data nodes, then everything can be recoveryed when your start the

On Thu, Apr 18, 2013 at 1:20 AM, Tom Brown <tombrown52@gmail.com<mailto:tombrown52@gmail.com>>
We have a situation where we want to physically move our small (4 node) cluster from one data
center to another. As part of this move, each node will receive both a new FQN and a new IP
address. As I understand it, HDFS is somehow tied to the the FQN or IP address, and changing
them causes data loss.

Is there any supported method of moving a cluster this way?

Thanks in advance!


View raw message