hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anil Gupta <anilgupt...@gmail.com>
Subject Re: Number of Maps running more than expected
Date Thu, 16 Aug 2012 14:27:13 GMT
Hi Gaurav,

Did you turn off speculative execution?

Best Regards,
Anil

On Aug 16, 2012, at 7:13 AM, Gaurav Dasgupta <gdsayshi@gmail.com> wrote:

> Hi users,
>  
> I am working on a CDH3 cluster of 12 nodes (Task Trackers running on all the 12 nodes
and 1 node running the Job Tracker).
> In order to perform a WordCount benchmark test, I did the following:
> Executed "RandomTextWriter" first to create 100 GB data (Note that I have changed the
"test.randomtextwrite.total_bytes" parameter only, rest all are kept default).
> Next, executed the "WordCount" program for that 100 GB dataset.
> The "Block Size" in "hdfs-site.xml" is set as 128 MB. Now, according to my calculation,
total number of Maps to be executed by the wordcount job should be 100 GB / 128 MB or 102400
MB / 128 MB = 800.
> But when I am executing the job, it is running a total number of 900 Maps, i.e., 100
extra.
> So, why this extra number of Maps? Although, my job is completing successfully without
any error.
>  
> Again, if I don't execute the "RandomTextWwriter" job to create data for my wordcount,
rather I put my own 100 GB text file in HDFS and run "WordCount", I can then see the number
of Maps are equivalent to my calculation, i.e., 800.
>  
> Can anyone tell me why this odd behaviour of Hadoop regarding the number of Maps for
WordCount only when the dataset is generated by RandomTextWriter? And what is the purpose
of these extra number of Maps?
>  
> Regards,
> Gaurav Dasgupta

Mime
View raw message