hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Number of Maps running more than expected
Date Fri, 17 Aug 2012 09:58:32 GMT
Hi Gaurav

To add on more clarity to my previous mail
If you are using the default TextInputFormat there will be *atleast* one
task generated per file even if the file size is less than
the block size. (assuming you have split size equal to block size)

So the right way to calculate the number of splits is per file and not on
the whole input data size. Calculate number of blocks per file and summing
up those values from all files would equate to the number of mappers.

What is the value of mapred.max.splitsize in your job? If it is less than
the hdfs block size there will be more spits for even for a hdfs block.

Bejoy KS

View raw message