hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juwei Shi <shiju...@gmail.com>
Subject Re: how to use mapred.min.split.size option ?
Date Wed, 25 May 2011 14:51:33 GMT
The following are suitable for hadoop 0.20.2.

2011/5/25 Juwei Shi <shijuwei@gmail.com>

> The input split size is detemined by map.min.split.size, dfs.block.size and
> mapred.map.tasks.
>
> goalSize = totalSize / mapred.map.tasks
> minSize = max {mapred.min.split.size, minSplitSize}
> splitSize= max (minSize, min(goalSize, dfs.block.size))
>
> minSplitSize is determined by each InputFormat such as
> SequenceFileInputFormat.
>
> You may want to refer to FileInputFormat.java for more details.
>
>
> 2011/5/25 Mapred Learn <mapred.learn@gmail.com>
>
>> Resending ====>
>>
>>
>> > Hi,
>> > I have few input splits that are few MB in size.
>> > I want to submit 1 GB of input to every mapper. Does anyone know how can
>> I do it ?
>> > Currently each mapper gets one input split that results in many small
>> map-output files.
>> >
>> > I tried setting -Dmapred.map.min.split.size=<number> , but still it does
>> not take effect.
>> >
>> > Thanks,
>> > -JJ
>>
>
>
>
> --
> - Juwei Shi
>



-- 
- Juwei Shi (史巨伟)

Mime
View raw message