hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Hartshorn <jhartsh...@connexity.com>
Subject What is the actual effect of changing rack topology/awareness
Date Mon, 20 Jun 2016 00:20:09 GMT
Hi, I run a 64 node Hadoop/HDFS/YARN cluster and have some questions I've been unable to find
any info on.  We are in the process of both adding 24 more nodes, and moving the 64 existing
nodes to a new cage.  The original nodes are currently in six racks (6, 7, 8, 11, 12, (1 by
default) as reported by dfsadmin -printTopology.  I forgot to add the newest group of the
original nodes to the rack topology file, but since they're all in the same physical rack
it lucked into being ok by default.

With both the new nodes and the new racks I have the chance to fix some problems.  In addition
I am going to 8 total physical racks.  One problem I will be addressing is multiple generations
of servers (intel e5-2620 0, v2, v3, and e5-26230 v4).  Currently each generation is grouped
"vertically", for example rack 6 is our oldest and is all power hungry, hot, e5-2620 0 cpu's.
 I intend to spread out each generation of server "horizontally" to even out power usuage
and allow the same or close to the same number of servers per rack.  Due just to this the
rack topology of about 80% of our existing servers is changing to something else.

I've already experimented changing the rack topology of two of my new nodes, by updating the
topology file everywhere, running hdfs dfsadmin -refreshNodes on the namenode, and restarting
the datanode.  However after doing this I saw no effect.

My questions:

How to effectively get the system to pick up rack topology/awareness changes?

What happens to existing data on a server when the rack topology is changed?

Can the system recognize when a rack is changed but the data does not need to be moved?

How does this effect things when the balancer is run?

Will changing the rack topology for 80% of the servers in a cluster cause a horrible storm
of datamoves that will bring the whole cluster to its knees?

Documentation on this seems pretty sparse

Not much here
One stack Overflow
One mailing list question kinda related

Info about our cluster:

CDH 5.4.5, Yarn, Spark, Running on Ubuntu 12.  HA Namenodes.

Thank You,

James Hartshorn


View raw message