hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Costa <psdc1...@gmail.com>
Subject Re: mapred.min.split.size
Date Fri, 18 Mar 2011 22:59:01 GMT
As I understand, mapred.min.split.size defines the minimum size of a
split. In the case below:

(1) HDFS block size = 32MB, mapred.min.split.size=64MB
(mapred.min.split.size can be only set to larger than HDFS block size)

when I run mapreduce, it means that a map will run one input split of
64MB of size, but in reality, it contains 2 HDFS blocks. Is this
right?



On Fri, Mar 18, 2011 at 8:12 PM, Marcos Ortiz <mlortiz@uci.cu> wrote:
> El 3/18/2011 3:54 PM, Pedro Costa escribió:
>>
>> Hi
>>
>> What's the purpose of the parameter "mapred.min.split.size"?
>>
>> Thanks,
>>
>
> There are many parameters that control the number of map tasks for a Job,
> and mapred.min.split.size controls the minimun size of a split. Other
> parameters are:
> - mapreduce.map.tasks: The suggested number of map tasks
> - dfs.block.size: the file system block size in bytes of the input file
>
> Regards
>
> --
> Marcos Luís Ortíz Valmaseda
>  Software Engineer
>  Universidad de las Ciencias Informáticas
>  Linux User # 418229
>
> http://uncubanitolinuxero.blogspot.com
> http://www.linkedin.com/in/marcosluis2186
>
>



-- 
Pedro

Mime
View raw message