hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: Hadoop cluster network requirement
Date Tue, 02 Aug 2011 00:29:06 GMT
Assuming everything is up this solution still will not scale given the latency, tcpip buffers,
sliding window etc. See BDP

Sent from my iPad

On Aug 1, 2011, at 4:57 PM, Michael Segel <michael_segel@hotmail.com> wrote:

> 
> Yeah what he said.
> Its never a good idea.
> Forget about losing a NN or a Rack, but just losing connectivity between data centers.
(It happens more than you think.)
> Your entire cluster in both data centers go down. Boom!
> 
> Its a bad design. 
> 
> You're better off doing two different clusters.
> 
> Is anyone really trying to sell this as a design? That's even more scary.
> 
> 
>> Subject: Re: Hadoop cluster network requirement
>> From: aw@apache.org
>> Date: Sun, 31 Jul 2011 20:28:53 -0700
>> To: common-user@hadoop.apache.org; saqibj@margallacomm.com
>> 
>> 
>> On Jul 31, 2011, at 7:30 PM, Saqib Jang -- Margalla Communications wrote:
>> 
>>> Thanks, I'm independently doing some digging into Hadoop networking
>>> requirements and 
>>> had a couple of quick follow-ups. Could I have some specific info on why
>>> different data centers 
>>> cannot be supported for master node and data node comms?
>>> Also, what 
>>> may be the benefits/use cases for such a scenario?
>> 
>>    Most people who try to put the NN and DNs in different data centers are trying
to achieve disaster recovery:  one file system in multiple locations.  That isn't the way
HDFS is designed and it will end in tears. There are multiple problems:
>> 
>> 1) no guarantee that one block replica will be each data center (thereby defeating
the whole purpose!)
>> 2) assuming one can work out problem 1, during a network break, the NN will lose
contact from one half of the  DNs, causing a massive network replication storm
>> 3) if one using MR on top of this HDFS, the shuffle will likely kill the network
in between (making MR performance pretty dreadful) is going to cause delays for the DN heartbeats
>> 4) I don't even want to think about rebalancing.
>> 
>>    ... and I'm sure a lot of other problems I'm forgetting at the moment.  So don't
do it.
>> 
>>    If you want disaster recovery, set up two completely separate HDFSes and run everything
in parallel.
>                         

Mime
View raw message