hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mapred Learn <mapred.le...@gmail.com>
Subject Re: how to use mapred.min.split.size option ?
Date Wed, 25 May 2011 14:59:33 GMT
Thanks Juwei !
I will go through this..

Sent from my iPhone

On May 25, 2011, at 7:51 AM, Juwei Shi <shijuwei@gmail.com> wrote:

> The following are suitable for hadoop 0.20.2. 
> 
> 2011/5/25 Juwei Shi <shijuwei@gmail.com>
> The input split size is detemined by map.min.split.size, dfs.block.size and mapred.map.tasks.

> 
> goalSize = totalSize / mapred.map.tasks 
> minSize = max {mapred.min.split.size, minSplitSize}
> splitSize= max (minSize, min(goalSize, dfs.block.size))
> 
> minSplitSize is determined by each InputFormat such as SequenceFileInputFormat. 
> 
> You may want to refer to FileInputFormat.java for more details. 
> 
> 
> 2011/5/25 Mapred Learn <mapred.learn@gmail.com>
> Resending ====>
> 
> 
> > Hi,
> > I have few input splits that are few MB in size.
> > I want to submit 1 GB of input to every mapper. Does anyone know how can I do it
?
> > Currently each mapper gets one input split that results in many small map-output
files.
> >
> > I tried setting -Dmapred.map.min.split.size=<number> , but still it does not
take effect.
> >
> > Thanks,
> > -JJ
> 
> 
> 
> -- 
> - Juwei Shi
> 
> 
> 
> -- 
> - Juwei Shi (史巨伟)

Mime
View raw message