hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Hadoop cluster network requirement
Date Mon, 01 Aug 2011 23:57:42 GMT

Yeah what he said.
Its never a good idea.
Forget about losing a NN or a Rack, but just losing connectivity between data centers. (It
happens more than you think.)
Your entire cluster in both data centers go down. Boom!

Its a bad design. 

You're better off doing two different clusters.

Is anyone really trying to sell this as a design? That's even more scary.


> Subject: Re: Hadoop cluster network requirement
> From: aw@apache.org
> Date: Sun, 31 Jul 2011 20:28:53 -0700
> To: common-user@hadoop.apache.org; saqibj@margallacomm.com
> 
> 
> On Jul 31, 2011, at 7:30 PM, Saqib Jang -- Margalla Communications wrote:
> 
> > Thanks, I'm independently doing some digging into Hadoop networking
> > requirements and 
> > had a couple of quick follow-ups. Could I have some specific info on why
> > different data centers 
> > cannot be supported for master node and data node comms?
> > Also, what 
> > may be the benefits/use cases for such a scenario?
> 
> 	Most people who try to put the NN and DNs in different data centers are trying to achieve
disaster recovery:  one file system in multiple locations.  That isn't the way HDFS is designed
and it will end in tears. There are multiple problems:
> 
> 1) no guarantee that one block replica will be each data center (thereby defeating the
whole purpose!)
> 2) assuming one can work out problem 1, during a network break, the NN will lose contact
from one half of the  DNs, causing a massive network replication storm
> 3) if one using MR on top of this HDFS, the shuffle will likely kill the network in between
(making MR performance pretty dreadful) is going to cause delays for the DN heartbeats
> 4) I don't even want to think about rebalancing.
> 
> 	... and I'm sure a lot of other problems I'm forgetting at the moment.  So don't do
it.
> 
> 	If you want disaster recovery, set up two completely separate HDFSes and run everything
in parallel.
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message