Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of raoshashidhar123@gmail.com
 designates 74.125.82.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOAr05vWa8pJV=etkNfDsQ1nUpF5R42JZ=+QpRkXXYSFPPdOLw@mail.gmail.com>
References: 
 <CAFY8jie5i+OoAfaVV=cdcDctu9gsPqh0wFt6zdQCPhnw=tjRLg@mail.gmail.com>
	<CAOAr05vWa8pJV=etkNfDsQ1nUpF5R42JZ=+QpRkXXYSFPPdOLw@mail.gmail.com>
Date: Tue, 15 Apr 2014 09:21:20 +0530
Message-ID: 
 <CAFY8jidjZ+8bNZMzPz2XkO2U6dysqVuACkhiQ9gd3JLxUCQ3Tg@mail.gmail.com>
Subject: Re: Time taken to do a word count on 10 TB data.
From: Shashidhar Rao <raoshashidhar123@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c38e70803ae804f70cb6e2

--001a11c38e70803ae804f70cb6e2
Content-Type: text/plain; charset=UTF-8

Thanks stantley shi


On Tue, Apr 15, 2014 at 6:25 AM, Stanley Shi <sshi@gopivotal.com> wrote:

> Rough estimation: since word count requires very little computation, it is
> io centric, we can do estimation based on disk speed.
>
> Assume 10 disk with each 100MBps for each node, that is about 1GBps per
> node; assume 70% utilization in mapper, we have 700MBps for each node. For
> 30 nodes, it is total about 20GBps, so we need about 500 seconds for 10 TB
> data.
> Adding some map reduce overhead and the final merging, say 20%
> overhead, we can expect about 10 minutes here.
>
>
> On Tuesday, April 15, 2014, Shashidhar Rao <raoshashidhar123@gmail.com>
> wrote:
>
>> Hi,
>>
>> Can somebody provide me a rough estimate of the time taken in hours/mins
>> for a cluster of say 30 nodes to run a map reduce job to perform a word
>> count on say 10 TB of data, assuming that the hardware and the map reduce
>> program is tuned optimally.
>>
>> Just a rough estimate, it could be 5TB,10 TB or 20 TB data. If not word
>> count it could be just to analyze the above size of data.
>>
>> Regards
>> Shashidhar
>>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>
>

--001a11c38e70803ae804f70cb6e2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks stantley shi</div><div class=3D"gmail_extra"><br><b=
r><div class=3D"gmail_quote">On Tue, Apr 15, 2014 at 6:25 AM, Stanley Shi <=
span dir=3D"ltr">&lt;<a href=3D"mailto:sshi@gopivotal.com" target=3D"_blank=
">sshi@gopivotal.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Rough estimation: since word count requires =
very little computation, it is io centric, we can do estimation based on di=
sk speed.<div>
<br></div><div>Assume 10 disk with each 100MBps for each node, that is abou=
t 1GBps per node; assume 70% utilization in mapper, we have 700MBps for eac=
h node. For 30 nodes, it is total about 20GBps, so we need about 500 second=
s for 10 TB data.</div>

<div>Adding some map reduce overhead and the final merging, say 20% overhea=
d,=C2=A0we can expect about 10 minutes here.</div><div class=3D"HOEnZb"><di=
v class=3D"h5"><div><br></div><div><br>On Tuesday, April 15, 2014, Shashidh=
ar Rao &lt;<a href=3D"mailto:raoshashidhar123@gmail.com" target=3D"_blank">=
raoshashidhar123@gmail.com</a>&gt; wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>Can =
somebody provide me a rough estimate of the time taken in hours/mins for a =
cluster of say 30 nodes to run a map reduce job to perform a word count on =
say 10 TB of data, assuming that the hardware and the map reduce program is=
 tuned optimally.</div>


<div><br></div><div>Just a rough estimate, it could be 5TB,10 TB or 20 TB d=
ata. If not word count it could be just to analyze the above size of data.<=
/div><div><br></div><div>Regards</div><div>Shashidhar</div></div>
</blockquote></div><br><br></div></div><span class=3D"HOEnZb"><font color=
=3D"#888888">-- <br><div dir=3D"ltr"><div>Regards,</div><div><b>Stanley Shi=
,</b></div><img src=3D"http://www.gopivotal.com/files/media/logos/pivotal-l=
ogo-email-signature.png"><br>
</div><br>
</font></span></blockquote></div><br></div>

--001a11c38e70803ae804f70cb6e2--