Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of mohitanchlia@gmail.com
 designates 209.85.210.53 as permitted sender)
Subject: Re: Hadoop cluster network requirement
References: 
 <E5C6ED175FFCE34D974527016DF712FD6A90AE5EE6@AMRXM3113.dir.svc.accenture.com>
 <000b01cc4ff2$ed066140$c71323c0$@margallacomm.com>
 <1548924D-41E6-45D3-A7EB-F324966A0E56@apache.org>
 <COL117-W23E3CA5FD22287C118CED58F380@phx.gbl>
From: Mohit Anchlia <mohitanchlia@gmail.com>
Content-Type: text/plain;
	charset=us-ascii
In-Reply-To: <COL117-W23E3CA5FD22287C118CED58F380@phx.gbl>
Message-Id: <6025B1DC-575B-4641-B40A-AC97F10252FE@gmail.com>
Date: Mon, 1 Aug 2011 17:29:06 -0700
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (iPad Mail 8C148)

Assuming everything is up this solution still will not scale given the laten=
cy, tcpip buffers, sliding window etc. See BDP

Sent from my iPad

On Aug 1, 2011, at 4:57 PM, Michael Segel <michael_segel@hotmail.com> wrote:=


>=20
> Yeah what he said.
> Its never a good idea.
> Forget about losing a NN or a Rack, but just losing connectivity between d=
ata centers. (It happens more than you think.)
> Your entire cluster in both data centers go down. Boom!
>=20
> Its a bad design.=20
>=20
> You're better off doing two different clusters.
>=20
> Is anyone really trying to sell this as a design? That's even more scary.
>=20
>=20
>> Subject: Re: Hadoop cluster network requirement
>> From: aw@apache.org
>> Date: Sun, 31 Jul 2011 20:28:53 -0700
>> To: common-user@hadoop.apache.org; saqibj@margallacomm.com
>>=20
>>=20
>> On Jul 31, 2011, at 7:30 PM, Saqib Jang -- Margalla Communications wrote:=

>>=20
>>> Thanks, I'm independently doing some digging into Hadoop networking
>>> requirements and=20
>>> had a couple of quick follow-ups. Could I have some specific info on why=

>>> different data centers=20
>>> cannot be supported for master node and data node comms?
>>> Also, what=20
>>> may be the benefits/use cases for such a scenario?
>>=20
>>    Most people who try to put the NN and DNs in different data centers ar=
e trying to achieve disaster recovery:  one file system in multiple location=
s.  That isn't the way HDFS is designed and it will end in tears. There are m=
ultiple problems:
>>=20
>> 1) no guarantee that one block replica will be each data center (thereby d=
efeating the whole purpose!)
>> 2) assuming one can work out problem 1, during a network break, the NN wi=
ll lose contact from one half of the  DNs, causing a massive network replica=
tion storm
>> 3) if one using MR on top of this HDFS, the shuffle will likely kill the n=
etwork in between (making MR performance pretty dreadful) is going to cause d=
elays for the DN heartbeats
>> 4) I don't even want to think about rebalancing.
>>=20
>>    ... and I'm sure a lot of other problems I'm forgetting at the moment.=
  So don't do it.
>>=20
>>    If you want disaster recovery, set up two completely separate HDFSes a=
nd run everything in parallel.
>                        =20