Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5AC8E10123 for ; Thu, 5 Sep 2013 16:47:34 +0000 (UTC) Received: (qmail 66986 invoked by uid 500); 5 Sep 2013 16:47:28 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 66544 invoked by uid 500); 5 Sep 2013 16:47:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 66537 invoked by uid 99); 5 Sep 2013 16:47:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Sep 2013 16:47:24 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of baskar.duraikannu@outlook.com designates 65.54.190.98 as permitted sender) Received: from [65.54.190.98] (HELO bay0-omc2-s23.bay0.hotmail.com) (65.54.190.98) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Sep 2013 16:47:17 +0000 Received: from BAY169-W47 ([65.54.190.125]) by bay0-omc2-s23.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 5 Sep 2013 09:46:55 -0700 X-TMN: [i0q8Ea6Cw8g6i0U4eLQ8WWFuKqZg3MLh] X-Originating-Email: [baskar.duraikannu@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_727690cd-6d64-4368-b649-51a240d6d1ef_" From: Baskar Duraikannu To: "user@hadoop.apache.org" Subject: RE: Multidata center support Date: Thu, 5 Sep 2013 12:46:55 -0400 Importance: Normal In-Reply-To: References: ,,<1281864.42056806.1377915726662.JavaMail.root@vmware.com> ,,, , MIME-Version: 1.0 X-OriginalArrivalTime: 05 Sep 2013 16:46:55.0114 (UTC) FILETIME=[876B0EA0:01CEAA57] X-Virus-Checked: Checked by ClamAV on apache.org --_727690cd-6d64-4368-b649-51a240d6d1ef_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Currently there is no relation betweeen weak consistency and hadoop. I just= spent more time thinking about the requirement (as outlined below) a) = Maintain total of 3 data centers b) Maintain 1 copy per data center = c) If any data center goes down=2C dont create additional copies. =20 Above is not a valid model=2C especially requirement (c). Because this wil= l take away "Strong Consistency" model supported by Hadoop. Hope this expla= ins.=20 I believe we can give up on requirement (c). I more currently exploring to = see whether anyway to achieve (a) and (b). Requirement (b) can also be rela= xed to have more copies per data center if needed=20 From: rahul.rec.dgp@gmail.com Date: Wed=2C 4 Sep 2013 10:04:49 +0530 Subject: Re: Multidata center support To: user@hadoop.apache.org Under replicated blocks are also consistent from a consumers point. Care of= explain the relation to weak consistency to hadoop. =0A= =0A= Thanks=2C Rahul On Wed=2C Sep 4=2C 2013 at 9:56 AM=2C Rahul Bhattacharjee wrote: =0A= =0A= Adam's response makes more sense to me to offline replicate generated data = from one cluster to another across data centers. =0A= =0A= =0A= Not sure if configurable block placement block placement policy is supporte= d in Hadoop.If yes =2C then alone side with rack awareness =2C you should b= e able to achieve the same. =0A= =0A= =0A= I could not follow your question related to weak consistency. =0A= Thanks=2C Rahul =0A= =0A= =0A= On Wed=2C Sep 4=2C 2013 at 2:20 AM=2C Baskar Duraikannu wrote: =0A= =0A= =0A= =0A= =0A= =0A= RahulAre you talking about rack-awareness script?=20 I did go through rack awareness. Here are the problems with rack awareness = w.r.to my (given) "business requirment"=0A= =0A= =0A= 1. Hadoop =2C default places two copies on the same rack and 1 copy on som= e other rack. This would work as long as we have two data centers. if busi= ness wants to have three data centers=2C then data would not be spread acro= ss. Separately there is a question around whether it is the right thing to = do or not. I have been promised by business that they would buy enough band= width such that each data center will be few milliseconds apart (in latency= ).=0A= =0A= =0A= 2. I believe Hadoop automatically re-replicates data if one or more node is= down. Assume when one out of 2 data center goes down. There will be a mass= ive data flow to create additional copies. When I say data center support= =2C I should be able to configure hadoop to say =0A= =0A= =0A= a) Maintain 1 copy per data center b) If any data center goes down= =2C dont create additional copies. =20 Above requirements that I am pointing will essentially move hadoop from str= ongly consistent to a week/eventual consistent model. Since this changes fu= ndamental architecture=2C it will probably break all sort of things... Migh= t not be possible ever in Hadoop. =0A= =0A= =0A= Thoughts?=20 SadakIs there a way to implement above requirement via Federation?=20 ThanksBaskar Date: Sun=2C 1 Sep 2013 00:20:04 +0530=0A= =0A= =0A= Subject: Re: Multidata center support From: visioner.sadak@gmail.com To: user@hadoop.apache.org=0A= =0A= =0A= What do you think friends I think hadoop clusters can run on multiple data = centers using FEDERATION On Sat=2C Aug 31=2C 2013 at 8:39 PM=2C Visioner Sadak wrote: =0A= =0A= =0A= =0A= The only problem i guess hadoop wont be able to duplicate data from one dat= a center to another but i guess i can identify data nodes or namenodes from= another data center correct me if i am wrong=0A= =0A= =0A= =0A= =0A= On Sat=2C Aug 31=2C 2013 at 7:00 PM=2C Visioner Sadak wrote: =0A= =0A= =0A= =0A= =0A= lets say that=20 you have some machines in europe and some in US I think you just need the = ips and configure them in your cluster set upit will work...=0A= =0A= On Sat=2C Aug 31=2C 2013 at 7:52 AM=2C Jun Ping Du wrote: =0A= =0A= =0A= =0A= =0A= =0A= Hi=2C Although you can set datacenter layer on your network topology=2C = it is never enabled in hadoop as lacking of replica placement and task sche= duling support. There are some work to add layers other than rack and node = under HADOOP-8848 but may not suit for your case. Agree with Adam that a cl= uster spanning multiple data centers seems not make sense even for DR case.= Do you have other cases to do such a deployment?=0A= =0A= =0A= =0A= =0A= =0A= Thanks=2C Junping From: "Adam Muise" =0A= =0A= =0A= =0A= =0A= =0A= To: user@hadoop.apache.org Sent: Friday=2C August 30=2C 2013 6:26:54 PM Subject: Re: Multidata center support =0A= Nothing has changed. DR best practice is still one (or more) clusters per s= ite and replication is handled via distributed copy or some variation of it= . A cluster spanning multiple data centers is a poor idea right now.=0A= =0A= =0A= =0A= =0A= =0A= =0A= On Fri=2C Aug 30=2C 2013 at 12:35 AM=2C Rahul Bhattacharjee wrote: =0A= =0A= =0A= =0A= =0A= =0A= =0A= My take on this. =0A= =0A= =0A= =0A= Why hadoop has to know about data center thing. I think it can be installed= across multiple data centers =2C however topology configuration would be r= equired to tell which node belongs to which data center and switch for bloc= k placement. =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= Thanks=2C Rahul On Fri=2C Aug 30=2C 2013 at 12:42 AM=2C Baskar Duraikannu wrote: =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= We have a need to setup hadoop across data centers. Does hadoop support mu= lti data center configuration? I searched through archives and have found t= hat hadoop did not support multi data center configuration some time back. = Just wanted to see whether situation has changed.=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= Please help. =0A= =0A= --=20 =0A= =0A= =0A= =0A= =0A= =0A= =0A= Adam MuiseSolution EngineerHortonworks=0A= =0A= =0A= =0A= =0A= =0A= =0A= amuise@hortonworks.com416-417-4037=0A= =0A= =0A= =0A= =0A= =0A= =0A= Hortonworks - Develops=2C Distributes and Supports Enterprise Apache Hadoop= . =0A= =0A= =0A= =0A= =0A= =0A= =0A= Hortonworks Virtual Sandbox =0A= =0A= =0A= =0A= =0A= =0A= =0A= Hadoop: Disruptive Possibilities by Jeff Needham=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the i= ndividual or entity to which it is addressed and may contain information th= at is confidential=2C privileged and exempt from disclosure under applicabl= e law. If the reader of this message is not the intended recipient=2C you a= re hereby notified that any printing=2C copying=2C dissemination=2C distrib= ution=2C disclosure or forwarding of this communication is strictly prohibi= ted. If you have received this communication in error=2C please contact the= sender immediately and delete it from your system. Thank You.=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= = --_727690cd-6d64-4368-b649-51a240d6d1ef_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Currently there is no relation b= etweeen weak consistency and hadoop. I just spent more time thinking about = the requirement (as outlined below)
 =3B  =3B  =3Ba) Mainta= in total of 3 data centers
 =3B =3B  = =3B b) Maintain 1 copy per data center
 =3B  = =3B  =3Bc) If any data center goes down=2C dont create additional copie= s.  =3B

Above is not a valid model=2C especial= ly requirement (c).  =3BBecause this will take away "Strong Consistency= " model supported by Hadoop. Hope this explains. =3B

I believe we can give up on requirement (c). I more currently explor= ing to see whether anyway to achieve (a) and (b). Requirement (b) can also = be relaxed to have more copies per data center if needed =3B

<= div>
From: rahul.rec.dgp@gmail.com
Date: Wed=2C 4= Sep 2013 10:04:49 +0530
Subject: Re: Multidata center support
To: us= er@hadoop.apache.org

Under rep= licated blocks are also consistent from a consumers point. Care of explain = the relation to weak consistency to hadoop.
=0A= =0A=
Thanks=2C
Rahul

On Wed=2C Sep 4=2C 2013 at 9:56 AM=2C Rahul = Bhattacharjee <=3Brahul.rec.dgp@gmail.com>=3B wrote:
= =0A= =0A=
Adam's response makes more sense to me = to offline replicate generated data from one cluster to another across data= centers.
=0A= =0A= =0A=
Not sure if configurable block placement block placement policy is = supported in Hadoop.If yes =2C then alone side with rack awareness =2C you = should be able to achieve the same.
=0A= =0A= =0A=
I could not follow your question related to weak consistency.
=0A=
Thanks=2C
Rahul

=0A= =0A= =0A=

On Wed=2C Sep 4=2C 2013 at 2:20 AM=2C= Baskar Duraikannu <=3Bbaskar.duraikannu@outlook.com>=3B<= /span> wrote:
=0A= =0A= =0A=
=0A= =0A= =0A=
Rahul
Are you talking about rack-aware= ness script? =3B

I did go through rack awarene= ss. Here are the problems with rack awareness w.r.to my (given) "business requirment"
=0A= =0A= =0A=

1.  =3BHadoop =2C default places two copies on the = same rack and 1 copy on some other rack.  =3BThis would work as long as= we have two data centers. if business wants to have three data centers=2C = then data would not be spread across. Separately there is a question around= whether it is the right thing to do or not. I have been promised by busine= ss that they would buy enough bandwidth such that each data center will be = few milliseconds apart (in latency).
=0A= =0A= =0A=

2. I believe Hadoop automatically re-replicates data if= one or more node is down. Assume when one out of 2 data center goes down. = There will be a massive data flow to create additional copies.  =3BWhen= I say data center support=2C I should be able to configure hadoop to say&n= bsp=3B
=0A= =0A= =0A=
 =3B  =3B  =3Ba) Maintain 1 copy per data center
 =3B  =3B  =3Bb) If any data center goes down=2C dont create a= dditional copies.  =3B

Above requirements that= I am pointing will essentially move hadoop from strongly consistent to a w= eek/eventual consistent model. Since this changes fundamental architecture= =2C it will probably break all sort of things... Might not be possible ever= in Hadoop. =3B
=0A= =0A= =0A=

Thoughts? =3B

Sadak
<= div>Is there a way to implement above requirement via Federation? =3B

Thanks
Baskar



Date: Sun=2C 1 Sep 2013 00:20:04 +0530
=0A= =0A= =0A=
Subject: Re: Multidata center support
From: visioner.sadak@gmail.comTo: user@hadoo= p.apache.org
=0A= =0A= =0A=


What do you think friends I think hadoop clus= ters can run on multiple data centers using FEDERATION


On Sat=2C Aug 31=2C 2013 at 8:39 PM=2C Visioner Sadak = <=3Bvisione= r.sadak@gmail.com>=3B wrote:
=0A= =0A= =0A= =0A=
The only problem i guess hadoop wont be able to duplicate data= from one data center to another but i guess i can identify data nodes or n= amenodes from another data center correct me if i am wrong
=0A= =0A= =0A= =0A=
=0A=

On Sat=2C Aug 31=2C 2013 at 7:00 PM=2C Visioner Sadak <=3Bvisioner.sadak@gmail.com>=3B wrote:
=0A= =0A= =0A= =0A= =0A=
lets say that =3B

you have some mac= hines in europe and some  =3Bin US I think you just need the ips and co= nfigure them in your cluster set up
it will work...
=0A=
=0A=

On Sat=2C Aug 31=2C 2013 at 7:52 AM=2C Jun Ping Du <=3Bjdu@vmwa= re.com>=3B wrote:
=0A= =0A= =0A= =0A= =0A= =0A=
Hi=2C
 =3B  =3B Although you ca= n set datacenter layer on your network topology=2C =3Bit is never enabl= ed in hadoop as lacking of replica placement and task scheduling support. T= here are some work to add layers other than rack and node under HADOOP-8848= but may not suit for your case. =3BAgree with Adam that a =3Bclust= er spanning multiple data centers seems not make sense even for DR case.&nb= sp=3BDo you have other cases to do such a deployment?
=0A= =0A= =0A= =0A= =0A= =0A=

Thanks=2C

Junping

From: "Adam Muise" <=3Bamuise@hortonworks.com>=3B
=0A= =0A= =0A= =0A= =0A= =0A= To: user= @hadoop.apache.org
Sent: Friday=2C August 30=2C 2013 6:26:54 = PM
Subject: Re: Multidata center support

=0A=
Nothing has changed. DR best practice is still o= ne (or more) clusters per site and replication is handled via distributed c= opy or some variation of it. A cluster spanning multiple data centers is a = poor idea right now.
=0A= =0A= =0A= =0A= =0A= =0A= =0A=



On Fri=2C Aug 3= 0=2C 2013 at 12:35 AM=2C Rahul Bhattacharjee <=3Brahul.rec.dgp@gmail.= com>=3B wrote:
=0A= =0A= =0A= =0A= =0A= =0A= =0A=
My take on this.=

=0A= =0A= =0A= =0A= Why hadoop has to know about data center thing. I think it can be installed= across multiple data centers =2C however topology configuration would be r= equired to tell which node belongs to which data center and switch for bloc= k placement.
=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A=
Thanks=2C
Rahul<= br>


On Fri=2C Aug 30=2C 2= 013 at 12:42 AM=2C Baskar Duraikannu <=3Bbaskar.duraikannu@outl= ook.com>=3B wrote:
=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A=
=0A= =0A= =0A=
We have a need to setup hadoop across data centers. &= nbsp=3BDoes hadoop support multi data center configuration? I searched thro= ugh archives and have found that hadoop did not support multi data center c= onfiguration some time back. Just wanted to see whether situation has chang= ed.
=0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A= =0A=
Please help.
=0A=

=0A=



--
=


=0A= =0A= =0A= =0A= =0A= =0A=
=0A=
A= dam Muise
Solution Engineer
Hortonworks
=0A= =0A= =0A= =0A= =0A= =0A= =0A= 416-= 417-4037
=0A= =0A= =0A=
=0A= =0A= =0A= =0A=
<= br>
Hortonworks Virtual Sandbox
=0A= =0A= =0A= =0A= =0A= =0A= =0A=

=0A= =0A= =0A= =0A= =0A= =0A= =0A=
=0A=
=0A= =0A=
=0A= CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individ= ual or entity to which it is addressed and may contain information that is = confidential=2C privileged and exempt from disclosure under applicable law.= If the reader of this message is not the intended recipient=2C you are her= eby notified that any printing=2C copying=2C dissemination=2C distribution= =2C disclosure or forwarding of this communication is strictly prohibited. = If you have received this communication in error=2C please contact the send= er immediately and delete it from your system. Thank You.
=0A= =0A= =0A= =0A= =0A= =0A=


=0A=

=0A=

=0A=
=0A=
= --_727690cd-6d64-4368-b649-51a240d6d1ef_--