hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shashidhar Rao <raoshashidhar...@gmail.com>
Subject Re: Time taken to do a word count on 10 TB data.
Date Tue, 15 Apr 2014 03:51:20 GMT
Thanks stantley shi


On Tue, Apr 15, 2014 at 6:25 AM, Stanley Shi <sshi@gopivotal.com> wrote:

> Rough estimation: since word count requires very little computation, it is
> io centric, we can do estimation based on disk speed.
>
> Assume 10 disk with each 100MBps for each node, that is about 1GBps per
> node; assume 70% utilization in mapper, we have 700MBps for each node. For
> 30 nodes, it is total about 20GBps, so we need about 500 seconds for 10 TB
> data.
> Adding some map reduce overhead and the final merging, say 20%
> overhead, we can expect about 10 minutes here.
>
>
> On Tuesday, April 15, 2014, Shashidhar Rao <raoshashidhar123@gmail.com>
> wrote:
>
>> Hi,
>>
>> Can somebody provide me a rough estimate of the time taken in hours/mins
>> for a cluster of say 30 nodes to run a map reduce job to perform a word
>> count on say 10 TB of data, assuming that the hardware and the map reduce
>> program is tuned optimally.
>>
>> Just a rough estimate, it could be 5TB,10 TB or 20 TB data. If not word
>> count it could be just to analyze the above size of data.
>>
>> Regards
>> Shashidhar
>>
>
>
> --
> Regards,
> *Stanley Shi,*
>
>
>

Mime
View raw message