Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of siddharth.tiwari@live.com
 designates 65.54.61.90 as permitted sender)
Message-ID: <SNT142-W524E4AAF5DA61B86EDC134E0AC0@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_7505e3d2-5698-4ef3-b73a-e23377a16784_"
From: Siddharth Tiwari <siddharth.tiwari@live.com>
To: USers Hadoop <user@hadoop.apache.org>
Subject: RE: One petabyte of data loading into HDFS with in 10 min.
Date: Mon, 10 Sep 2012 19:22:57 +0000
Importance: Normal
In-Reply-To: 
 <CA4BC499FACF9B41B788493784AD1EF007CE95C6@SUSHDC8000.TD.TERADATA.COM>
References: 
 <CAL3=Dw2yokDnza43H6Dd6v3c1wOaZdoVZpSZ1+YgsVLatDjYvA@mail.gmail.com>,<CC6D39E1.CF9B%clehene@adobe.com>,<CAF-umFNiybKjBy4SXiKL6WELd6Av126eDCo9ARsJYbRqUHaqoA@mail.gmail.com>,<BLU0-SMTP25244738DF94E04B36BD788FAF0@phx.gbl>,<CAL3=Dw2P9DFXLdUjGVSHvDs4ZpDyiwBXdO0AoSJ1DFR7uShVLw@mail.gmail.com>,<BLU0-SMTP196119A7AC721E2FA287AA98FAC0@phx.gbl>,<CA4BC499FACF9B41B788493784AD1EF007CE95C6@SUSHDC8000.TD.TERADATA.COM>
MIME-Version: 1.0

--_7505e3d2-5698-4ef3-b73a-e23377a16784_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable


Well can't you load the incremental data only ? as the goal seems quite unr=
ealistic. The big guns have already spoken :P

*------------------------*

Cheers !!!

Siddharth Tiwari

Have a refreshing day !!!
"Every duty is holy=2C and devotion to duty is the highest form of worship =
of God.=94=20

"Maybe other people will try to limit me but I don't limit myself"


From: Alex.Gauthier@Teradata.com
To: user@hadoop.apache.org=3B mike.segel@thinkbiganalytics.com
Subject: RE: One petabyte of data loading into HDFS with in 10 min.
Date: Mon=2C 10 Sep 2012 16:17:20 +0000


Well said Mike. Lots of =93funny questions=94 around here lately=85

=20


From: Michael Segel [mailto:michael_segel@hotmail.com]


Sent: Monday=2C September 10=2C 2012 4:50 AM

To: user@hadoop.apache.org

Cc: Michael Segel

Subject: Re: One petabyte of data loading into HDFS with in 10 min.


=20
=20


On Sep 10=2C 2012=2C at 2:40 AM=2C prabhu K <prabhu.hadoop@gmail.com> wrote=
:


Hi Users=2C


=20


Thanks for the response.


=20


We have loaded 100GB data loaded into HDFS=2C time taken 1hr.with below con=
figuration.
Each Node (1 machine master=2C 2 machines  are slave)

1.  =20
500 GB hard disk.


2.  =20
4Gb RAM


3.  =20
3 quad code CPUs.


4.  =20
Speed 1333 MHz


=20

Now=2C we are planning to load 1 petabyte of data (single file)  into Hadoo=
p HDFS and Hive table within 10-20 minutes. For this we need a clarificatio=
n below.

Ok...


=20


Some say that I am sometimes too harsh in my criticisms so take what I say =
with a grain of salt...


=20


You loaded 100GB in an hour using woefully underperforming hardware and are=
 now saying you want to load 1PB in 10 mins.


=20


I would strongly suggest that you first learn more about Hadoop.  No really=
. Looking at your first machine=2C its obvious that you don't really grok h=
adoop and what it requires to achieve optimum performance.  You couldn't ev=
en extrapolate
 any meaningful data from your current environment.


=20


Secondly=2C I think you need to actually think about the problem. Did you m=
ean PB or TB? Because your math seems to be off by a couple orders of magni=
tude.=20


=20


A single file measured in PBs? That is currently impossible using today (20=
12) technology. In fact a single file that is measured in PBs wouldn't exis=
t within the next 5 years and most likely the next decade. [Moore's law is =
all about CPU
 power=2C not disk density.]


=20


Also take a look at networking.=20


ToR switch design differs=2C however current technology=2C the fabric tends=
 to max out at 40GBs.  What's the widest fabric on a backplane?=20


That's your first bottleneck because even if you had a 1PB of data=2C you c=
ouldn't feed it to the cluster fast enough.=20


=20


Forget disk. look at PCIe based memory. (Money no object=2C right? )=20


You still couldn't populate it fast enough.


=20


I guess Steve hit this nail on the head when he talked about this being a h=
omework assignment.=20


=20


High school maybe?=20


=20


1. what are the system configuration setup required for all the 3 machine=
=92s ?.
2. Hard disk size.
3. RAM size.
4. Mother board
5. Network cable
6. How much Gbps  Infiniband required.
 For the same setup we need cloud computing environment too?
Please suggest and help me on this.
 Thanks=2C
Prabhu.


On Fri=2C Sep 7=2C 2012 at 7:30 PM=2C Michael Segel <michael_segel@hotmail.=
com> wrote:
Sorry=2C but you didn't account for the network saturation.


And why 1GBe and not 10GBe? Also which version of hadoop?


Here MapR works well with bonding two 10GBe ports and with the right switch=
=2C you could do ok.

Also 2 ToR switches... per rack. etc...


How many machines? 150? 300? more?


Then you don't talk about how much memory=2C CPUs=2C what type of storage..=
.


Lots of factors.


I'm sorry to interrupt this mental masturbation about how to load 1PB in 10=
min.

There is a lot more questions that should be asked that weren't.


Hey but look. Its a Friday=2C so I suggest some pizza=2C beer and then take=
 it to a white board.


But what do I know? In a different thread=2C I'm talking about how to tame =
HR and Accounting so they let me play with my team Ninja!

:-P


On Sep 5=2C 2012=2C at 9:56 AM=2C zGreenfelder <zgreenfelder@gmail.com> wro=
te:


> On Wed=2C Sep 5=2C 2012 at 10:43 AM=2C Cosmin Lehene <clehene@adobe.com> =
wrote:

>> Here's an extremely na=EFve ballpark estimation: at theoretical hardware

>> speed=2C for 3PB representing 1PB with 3x replication

>>

>> Over a single 1Gbps connection (and I'm not sure=2C you can actually rea=
ch

>> 1Gbps)

>> (3 petabytes) / (1 Gbps) =3D 291.271111 days

>>

>> So you'd need at least 40=2C000 1Gbps network cards to get that in 10 mi=
nutes

>> :) - (3PB/1Gbps)/40000

>>

>> The actual number of nodes would depend a lot on the actual network

>> architecture=2C the type of storage you use (SSD=2C  HDD)=2C etc.

>>

>> Cosmin

>

> ah=2C I went te other direction with the math=2C and assumed no

> replication (completely unsafe and never reasonable for a real=2C

> production environment=2C but since we're all theory and just looking

> for starting point numbers)

>

>

> 1PB in 10 min =3D=3D

> 1=2C000=2C000gB in 10 min =3D=3D

> 8=2C000=2C000gb in 600 seconds =3D=3D

>

> 80=2C000/6  ~=3D 14k machines running at gigabit or about 1.5k machines i=
f you

> get 10Gb connected machines.

>

> all assuming there's no network or cluster sync overhead

> (of course there would be)

>

>

> that seems like some pretty deep pockets to get to < 10 minute load

> time for that much data.

>

> I could also be off=2C I just threw some stuff together somewhat

> quickly.between conf calls.

>

> --

> Even the Magic 8 ball has an opinion on email clients: Outlook not so goo=
d.

>


=20

=20
 		 	   		  =

--_7505e3d2-5698-4ef3-b73a-e23377a16784_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 10pt=3B
font-family:Tahoma
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>
<font style=3D"" color=3D"#17365D" face=3D"Tahoma">Well can't you load the =
incremental data only ? as the goal seems quite unrealistic. The big guns h=
ave already spoken :P<br id=3D"FontBreak"></font><br><br><strong><font colo=
r=3D"#00b050">*------------------------*</font></strong><br>
<font style=3D"" color=3D"#17365D" face=3D"Franklin Gothic Medium"><strong>=
<u>Cheers !!!</u></strong></font><font style=3D"" color=3D"#17365D" face=3D=
"Franklin Gothic Medium"><br></font>
<font style=3D"" color=3D"#17365D" face=3D"Franklin Gothic Medium"><strong>=
<font style=3D"">Siddharth</font> <font style=3D"">Tiwari</font></strong></=
font><font style=3D"" color=3D"#17365D" face=3D"Franklin Gothic Medium"><br=
></font>
<font style=3D"" color=3D"#17365D" face=3D"Franklin Gothic Medium">Have a r=
efreshing day !!!</font><font style=3D"" face=3D"Franklin Gothic Medium"><b=
r></font><font style=3D"" color=3D"#974806" face=3D"Franklin Gothic Medium"=
><b>"Every duty is holy=2C and devotion to duty is the highest form of wors=
hip of God.=94 </b></font><br>
<b><font style=3D"" color=3D"#002060">"</font><span id=3D"ecx:1ha"><font st=
yle=3D"" color=3D"#C00000">Maybe other people will try to limit me but I do=
n't limit myself</font><font style=3D"" color=3D"#002060">"</font></span></=
b><br><br><br><div><div id=3D"SkyDrivePlaceholder"></div><hr id=3D"stopSpel=
ling">From: Alex.Gauthier@Teradata.com<br>To: user@hadoop.apache.org=3B mik=
e.segel@thinkbiganalytics.com<br>Subject: RE: One petabyte of data loading =
into HDFS with in 10 min.<br>Date: Mon=2C 10 Sep 2012 16:17:20 +0000<br><br=
>


<style><!--
.ExternalClass p.ecxMsoNormal=2C .ExternalClass li.ecxMsoNormal=2C .Externa=
lClass div.ecxMsoNormal
{margin-bottom:.0001pt=3Bfont-size:12.0pt=3Bfont-family:"Times New Roman"=
=2C"serif"=3B}
.ExternalClass a:link=2C .ExternalClass span.ecxMsoHyperlink
{color:blue=3Btext-decoration:underline=3B}
.ExternalClass a:visited=2C .ExternalClass span.ecxMsoHyperlinkFollowed
{color:purple=3Btext-decoration:underline=3B}
.ExternalClass span.ecxEmailStyle17
{font-family:"Calibri"=2C"sans-serif"=3Bcolor:#1F497D=3B}
.ExternalClass .ecxMsoChpDefault
{font-size:10.0pt=3B}
@page WordSection1
{size:8.5in 11.0in=3B}
.ExternalClass div.ecxWordSection1
{page:WordSection1=3B}

--></style>


<div class=3D"ecxWordSection1">
<p class=3D"ecxMsoNormal"><span style=3D"font-size:11.0pt=3Bfont-family:&qu=
ot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B=3Bcolor:#1F497D">Well sai=
d Mike. Lots of =93funny questions=94 around here lately=85
</span></p>
<p class=3D"ecxMsoNormal"><span style=3D"font-size:11.0pt=3Bfont-family:&qu=
ot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B=3Bcolor:#1F497D">&nbsp=3B=
</span></p>
<div>
<div style=3D"border:none=3Bborder-top:solid #B5C4DF 1.0pt=3Bpadding:3.0pt =
0in 0in 0in">
<p class=3D"ecxMsoNormal"><b><span style=3D"font-size:10.0pt=3Bfont-family:=
&quot=3BTahoma&quot=3B=2C&quot=3Bsans-serif&quot=3B">From:</span></b><span =
style=3D"font-size:10.0pt=3Bfont-family:&quot=3BTahoma&quot=3B=2C&quot=3Bsa=
ns-serif&quot=3B"> Michael Segel [mailto:michael_segel@hotmail.com]
<br>
<b>Sent:</b> Monday=2C September 10=2C 2012 4:50 AM<br>
<b>To:</b> user@hadoop.apache.org<br>
<b>Cc:</b> Michael Segel<br>
<b>Subject:</b> Re: One petabyte of data loading into HDFS with in 10 min.<=
/span></p>
</div>
</div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
<div>
<div>
<p class=3D"ecxMsoNormal">On Sep 10=2C 2012=2C at 2:40 AM=2C prabhu K &lt=
=3B<a href=3D"mailto:prabhu.hadoop@gmail.com">prabhu.hadoop@gmail.com</a>&g=
t=3B wrote:</p>
</div>
<p class=3D"ecxMsoNormal"><br>
<br>
</p>
<div>
<p class=3D"ecxMsoNormal">Hi Users=2C</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">Thanks for the response.</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">We have load=
ed 100GB data loaded into HDFS=2C time taken 1hr.with below configuration.<=
/span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt=3Btext-indent:.5in"=
><span style=3D"font-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&qu=
ot=3B">Each Node (1 machine master=2C 2 machines &nbsp=3Bare slave)</span><=
/p>
<div style=3D"margin-left:.75in">
<p class=3D"ecxMsoNormal"><span style=3D"font-family:&quot=3BTrebuchet MS&q=
uot=3B=2C&quot=3Bsans-serif&quot=3B">1.</span><span style=3D"font-size:7.0p=
t">&nbsp=3B&nbsp=3B&nbsp=3B
</span><span style=3D"font-family:&quot=3BTrebuchet MS&quot=3B=2C&quot=3Bsa=
ns-serif&quot=3B">500 GB hard disk.</span></p>
</div>
<div style=3D"margin-left:.75in">
<p class=3D"ecxMsoNormal"><span style=3D"font-family:&quot=3BTrebuchet MS&q=
uot=3B=2C&quot=3Bsans-serif&quot=3B">2.</span><span style=3D"font-size:7.0p=
t">&nbsp=3B&nbsp=3B&nbsp=3B
</span><span style=3D"font-family:&quot=3BTrebuchet MS&quot=3B=2C&quot=3Bsa=
ns-serif&quot=3B">4Gb RAM</span></p>
</div>
<div style=3D"margin-left:.75in">
<p class=3D"ecxMsoNormal"><span style=3D"font-family:&quot=3BTrebuchet MS&q=
uot=3B=2C&quot=3Bsans-serif&quot=3B">3.</span><span style=3D"font-size:7.0p=
t">&nbsp=3B&nbsp=3B&nbsp=3B
</span><span style=3D"font-family:&quot=3BTrebuchet MS&quot=3B=2C&quot=3Bsa=
ns-serif&quot=3B">3 quad code CPUs.</span></p>
</div>
<div style=3D"margin-left:.75in">
<p class=3D"ecxMsoNormal"><span style=3D"font-family:&quot=3BTrebuchet MS&q=
uot=3B=2C&quot=3Bsans-serif&quot=3B">4.</span><span style=3D"font-size:7.0p=
t">&nbsp=3B&nbsp=3B&nbsp=3B
</span><span style=3D"font-family:&quot=3BTrebuchet MS&quot=3B=2C&quot=3Bsa=
ns-serif&quot=3B">Speed 1333 MHz</span></p>
</div>
<div style=3D"margin-bottom:10.0pt">
<p class=3D"ecxMsoNormal"><span style=3D"font-family:&quot=3BCalibri&quot=
=3B=2C&quot=3Bsans-serif&quot=3B">&nbsp=3B</span></p>
</div>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">Now=2C we ar=
e planning to load 1 petabyte of data (single file) &nbsp=3Binto Hadoop HDF=
S and Hive table within 10-20 minutes. For this we need a clarification bel=
ow.</span></p>
</div>
<p class=3D"ecxMsoNormal">Ok...</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">Some say that I am sometimes too harsh in my crit=
icisms so take what I say with a grain of salt...</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">You loaded 100GB in an hour using woefully underp=
erforming hardware and are now saying you want to load 1PB in 10 mins.</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">I would strongly suggest that you first learn mor=
e about Hadoop. &nbsp=3BNo really. Looking at your first machine=2C its obv=
ious that you don't really grok hadoop and what it requires to achieve opti=
mum performance. &nbsp=3BYou couldn't even extrapolate
 any meaningful data from your current environment.</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">Secondly=2C I think you need to actually think ab=
out the problem. Did you mean PB or TB? Because your math seems to be off b=
y a couple orders of magnitude.&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">A single file measured in PBs? That is currently =
impossible using today (2012) technology. In fact a single file that is mea=
sured in PBs wouldn't exist within the next 5 years and most likely the nex=
t decade. [Moore's law is all about CPU
 power=2C not disk density.]</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">Also take a look at networking.&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">ToR switch design differs=2C however current tech=
nology=2C the fabric tends to max out at 40GBs. &nbsp=3BWhat's the widest f=
abric on a backplane?&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">That's your first bottleneck because even if you =
had a 1PB of data=2C you couldn't feed it to the cluster fast enough.&nbsp=
=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">Forget disk. look at PCIe based memory. (Money no=
 object=2C right? )&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">You still couldn't populate it fast enough.</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">I guess Steve hit this nail on the head when he t=
alked about this being a homework assignment.&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">High school maybe?&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<div>
<p class=3D"ecxMsoNormal"><br>
<br>
</p>
<div>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">1. what are =
the system configuration setup required for all the 3 machine=92s ?.</span>=
</p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">2. Hard disk=
 size.</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">3. RAM size.=
</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">4. Mother bo=
ard</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">5. Network c=
able</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">6. How much =
Gbps &nbsp=3BInfiniband required.</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">&nbsp=3BFor =
the same setup we need cloud computing environment too?</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">Please sugge=
st and help me on this.</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">&nbsp=3BThan=
ks=2C</span></p>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:10.0pt"><span style=3D"fon=
t-family:&quot=3BCalibri&quot=3B=2C&quot=3Bsans-serif&quot=3B">Prabhu.</spa=
n></p>
</div>
<div>
<p class=3D"ecxMsoNormal">On Fri=2C Sep 7=2C 2012 at 7:30 PM=2C Michael Seg=
el &lt=3B<a href=3D"mailto:michael_segel@hotmail.com">michael_segel@hotmail=
.com</a>&gt=3B wrote:</p>
<p class=3D"ecxMsoNormal">Sorry=2C but you didn't account for the network s=
aturation.<br>
<br>
And why 1GBe and not 10GBe? Also which version of hadoop?<br>
<br>
Here MapR works well with bonding two 10GBe ports and with the right switch=
=2C you could do ok.<br>
Also 2 ToR switches... per rack. etc...<br>
<br>
How many machines? 150? 300? more?<br>
<br>
Then you don't talk about how much memory=2C CPUs=2C what type of storage..=
.<br>
<br>
Lots of factors.<br>
<br>
I'm sorry to interrupt this mental masturbation about how to load 1PB in 10=
min.<br>
There is a lot more questions that should be asked that weren't.<br>
<br>
Hey but look. Its a Friday=2C so I suggest some pizza=2C beer and then take=
 it to a white board.<br>
<br>
But what do I know? In a different thread=2C I'm talking about how to tame =
HR and Accounting so they let me play with my team Ninja!<br>
:-P</p>
<div>
<div>
<p class=3D"ecxMsoNormal" style=3D"margin-bottom:12.0pt"><br>
On Sep 5=2C 2012=2C at 9:56 AM=2C zGreenfelder &lt=3B<a href=3D"mailto:zgre=
enfelder@gmail.com">zgreenfelder@gmail.com</a>&gt=3B wrote:<br>
<br>
&gt=3B On Wed=2C Sep 5=2C 2012 at 10:43 AM=2C Cosmin Lehene &lt=3B<a href=
=3D"mailto:clehene@adobe.com">clehene@adobe.com</a>&gt=3B wrote:<br>
&gt=3B&gt=3B Here's an extremely na=EFve ballpark estimation: at theoretica=
l hardware<br>
&gt=3B&gt=3B speed=2C for 3PB representing 1PB with 3x replication<br>
&gt=3B&gt=3B<br>
&gt=3B&gt=3B Over a single 1Gbps connection (and I'm not sure=2C you can ac=
tually reach<br>
&gt=3B&gt=3B 1Gbps)<br>
&gt=3B&gt=3B (3 petabytes) / (1 Gbps) =3D 291.271111 days<br>
&gt=3B&gt=3B<br>
&gt=3B&gt=3B So you'd need at least 40=2C000 1Gbps network cards to get tha=
t in 10 minutes<br>
&gt=3B&gt=3B :) - (3PB/1Gbps)/40000<br>
&gt=3B&gt=3B<br>
&gt=3B&gt=3B The actual number of nodes would depend a lot on the actual ne=
twork<br>
&gt=3B&gt=3B architecture=2C the type of storage you use (SSD=2C &nbsp=3BHD=
D)=2C etc.<br>
&gt=3B&gt=3B<br>
&gt=3B&gt=3B Cosmin<br>
&gt=3B<br>
&gt=3B ah=2C I went te other direction with the math=2C and assumed no<br>
&gt=3B replication (completely unsafe and never reasonable for a real=2C<br=
>
&gt=3B production environment=2C but since we're all theory and just lookin=
g<br>
&gt=3B for starting point numbers)<br>
&gt=3B<br>
&gt=3B<br>
&gt=3B 1PB in 10 min =3D=3D<br>
&gt=3B 1=2C000=2C000gB in 10 min =3D=3D<br>
&gt=3B 8=2C000=2C000gb in 600 seconds =3D=3D<br>
&gt=3B<br>
&gt=3B 80=2C000/6 &nbsp=3B~=3D 14k machines running at gigabit or about 1.5=
k machines if you<br>
&gt=3B get 10Gb connected machines.<br>
&gt=3B<br>
&gt=3B all assuming there's no network or cluster sync overhead<br>
&gt=3B (of course there would be)<br>
&gt=3B<br>
&gt=3B<br>
&gt=3B that seems like some pretty deep pockets to get to &lt=3B 10 minute =
load<br>
&gt=3B time for that much data.<br>
&gt=3B<br>
&gt=3B I could also be off=2C I just threw some stuff together somewhat<br>
&gt=3B quickly.between conf calls.<br>
&gt=3B<br>
&gt=3B --<br>
&gt=3B Even the Magic 8 ball has an opinion on email clients: Outlook not s=
o good.<br>
&gt=3B</p>
</div>
</div>
</div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div>
<p class=3D"ecxMsoNormal">&nbsp=3B</p>
</div></div> 		 	   		  </div></body>
</html>=

--_7505e3d2-5698-4ef3-b73a-e23377a16784_--