hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: Best way to reduce a 8-node cluster in half and get hdfs to come out of safe mode
Date Fri, 06 Aug 2010 06:21:07 GMT

On Aug 5, 2010, at 10:42 PM, Steve Kuo wrote:

> As part of our experimentation, the plan is to pull 4 slave nodes out of a
> 8-slave/1-master cluster. With replication factor set to 3, I thought
> losing half of the cluster may be too much for hdfs to recover.  Thus I
> copied out all relevant data from hdfs to local disk and reconfigure the
> cluster.

It depends.  If you have configured Hadoop to have a topology such that the 8 nodes were in
2 logical racks, then it would have worked just fine.  If you didn't have any topology configured,
then each node is considered its own rack.  So pulling half of the grid down means you are
likely losing a good chunk of all your blocks.

> The 4 slave nodes started okay but hdfs never left safe mode.  The nn.log
> has the following line.  What is the best way to deal with this?  Shall I
> restart the cluster with 8-node and then delete
> /data/hadoop-hadoop/mapred/system?  Or shall I reformat hdfs?

Two ways to go:

Way #1:

1) configure dfs.hosts
2) bring up all 8 nodes
3) configure dfs.hosts.exclude to include the 4 you don't want
4) dfsadmin -refreshNodes to start decommissioning the 4 you don't want

Way #2:

1) configure a topology
2) bring up all 8 nodes
3) setrep all files +1
4) wait for nn to finish replication
5) pull 4 nodes
6) bring down nn
7) remove topology
8) bring nn up
9) setrep -1

View raw message