hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sandeep das <yarnhad...@gmail.com>
Subject Increasing input size to MAP tasks
Date Tue, 05 Jan 2016 11:09:17 GMT
Hi All,

I've a pig script which runs over YARN. Each MAP task created by this pig
script is taking 128MB as input and not more than that.

I want to increase the input size of each map job. I've read that input
size is determined using following formula:

max(min split size, min(block size, max split size)).

Following are the values I'm setting for these parameters:

dfs.blocksize = 134217728
mapreduce.input.fileinputformat.split.maxsize = 1610612736
mapreduce.input.fileinputformat.split.minsize = 805306368
mapreduce.input.fileinputformat.split.minsize.per.node = 222298112
mapreduce.input.fileinputformat.split.minsize.per.rack = 222298112

According the values configured the input size should be 805306368 but it
is still 134217728 which same as dfs.blocksize.

But every time I change my dfs.blocksize to higher value the input to MAP
tasks increase by the same amount.

Following is the setup:
Cloudera : 5.5.1
Hadoop: 2.6.0
Pig: 0.12.0


View raw message