hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandeep das <yarnhad...@gmail.com>
Subject Re: Increasing input size to MAP tasks
Date Wed, 06 Jan 2016 06:24:30 GMT
Hi All,

You can ignore this mail. I've found the configuration parameter which i
was looking for i.e. pig.maxCombinedSplitSize and pig.splitCombination.


Regards,
Sandeep

On Tue, Jan 5, 2016 at 4:39 PM, sandeep das <yarnhadoop@gmail.com> wrote:

> Hi All,
>
> I've a pig script which runs over YARN. Each MAP task created by this pig
> script is taking 128MB as input and not more than that.
>
> I want to increase the input size of each map job. I've read that input
> size is determined using following formula:
>
> max(min split size, min(block size, max split size)).
>
> Following are the values I'm setting for these parameters:
>
> dfs.blocksize = 134217728
> mapreduce.input.fileinputformat.split.maxsize = 1610612736
> mapreduce.input.fileinputformat.split.minsize = 805306368
> mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
> mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112
>
> According the values configured the input size should be 805306368 but it
> is still 134217728 which same as dfs.blocksize.
>
> But every time I change my dfs.blocksize to higher value the input to MAP
> tasks increase by the same amount.
>
>
> Following is the setup:
> Cloudera : 5.5.1
> Hadoop: 2.6.0
> Pig: 0.12.0
>
>
> Regards,
> Sandeep
>

Mime
View raw message