Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of rongzheyi@gmail.com
 designates 209.85.128.41 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAG1CohQ1=80KwTwKo9ypiN62WtvPWFuYPdsq=1mPnOW5BOH6XA@mail.gmail.com>
References: 
 <CALtSBbaYkmBEcBbtG3yMFFAFNsJ1hCdZeatDLghn3xzBR67Y=A@mail.gmail.com>
 <CAEAKFL8ieWsWShqCa+zJrwhETZpEnP2uBi+-5fRbdVSygch5_w@mail.gmail.com>
 <1361612254.36060.YahooMailNeo@web194705.mail.sg3.yahoo.com>
 <1361623966.2874.YahooMailNeo@web194705.mail.sg3.yahoo.com>
 <CANiuQZek+GjdZi_5xCgLznEzYzv9+FQXWBTog_mYdJw8iueGSA@mail.gmail.com>
 <1361706030.51309.YahooMailNeo@web194706.mail.sg3.yahoo.com>
 <CAORpBsgTK=ZQoWBWzzrsC696zc=mXGzJWDY3tCxpRyOZocaTPA@mail.gmail.com>
 <1361706818.42489.YahooMailNeo@web194702.mail.sg3.yahoo.com>
 <1362390140.71433.YahooMailNeo@web194701.mail.sg3.yahoo.com>
 <1363250753.42275.YahooMailNeo@web194704.mail.sg3.yahoo.com>
 <1363253090.10644.YahooMailNeo@web194702.mail.sg3.yahoo.com>
 <CAG1CohTuHaKec3V3e1vrMx196Kp7E4tS1ENMSan905+DiG9pSA@mail.gmail.com>
 <CADPnNAJz1ANic=k68-HRgOv=d5XhhWCqaT7qv4kTNNeqqkEP3g@mail.gmail.com>
 <CAG1CohRUsevSHdY=1Ae3Bw6pLccBPzzj9OTKxyLQvo-LdkoWzQ@mail.gmail.com>
 <CAG1CohQ1=80KwTwKo9ypiN62WtvPWFuYPdsq=1mPnOW5BOH6XA@mail.gmail.com>
From: Zheyi RONG <rongzheyi@gmail.com>
Date: Fri, 15 Mar 2013 11:32:55 +0100
Message-ID: 
 <CADPnNAL88FzR-5pcyFfmb3N4pp7RLztqTscQVm+F+2Zofds7bw@mail.gmail.com>
Subject: Re: Increase the number of mappers in PM mode
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=002354471084c674b404d7f42a0f

--002354471084c674b404d7f42a0f
Content-Type: text/plain; charset=ISO-8859-1

Indeed you cannot explicitly set the number of mappers, but still you can
gain some control over it, by setting mapred.max.split.size, or
mapred.min.split.size.

For example, if you have a file of 10GB (10737418240 B), you would like 10
mappers, then each mapper has to deal with 1GB data.
According to "splitsize = max(minimumSize, min(maximumSize, blockSize))",
you can set mapred.min.split.size=1073741824 (1GB), i.e.
$hadoop jar -Dmapred.min.split.size=1073741824 yourjar yourargs

It is well explained in thread:
http://stackoverflow.com/questions/9678180/change-file-split-size-in-hadoop.

Regards,
Zheyi.

On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <yypvsxf19870706@gmail.com>wrote:

> s

--002354471084c674b404d7f42a0f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Indeed you cannot explicitly set the number of mappers, but still you can g=
ain some control over it, by setting mapred.max.split.size, or mapred.min.s=
plit.size.<div><br></div><div>For example, if you have a file of 10GB (1073=
7418240 B), you would like 10 mappers, then each mapper has to deal with 1G=
B data.</div>

<div>According to &quot;splitsize =3D=A0max(minimumSize, min(maximumSize, b=
lockSize))&quot;, you can set mapred.min.split.size=3D1073741824 (1GB), i.e=
. =A0 =A0</div><div>$hadoop jar -Dmapred.min.split.size=3D1073741824 yourja=
r yourargs</div>

<div><br></div><div>It is well explained in thread:=A0<a href=3D"http://sta=
ckoverflow.com/questions/9678180/change-file-split-size-in-hadoop">http://s=
tackoverflow.com/questions/9678180/change-file-split-size-in-hadoop</a>.</d=
iv>

<div><br></div><div>Regards,</div><div>Zheyi.</div><div><br></div><div><div=
><div class=3D"gmail_quote">On Fri, Mar 15, 2013 at 8:49 AM, YouPeng Yang <=
span dir=3D"ltr">&lt;<a href=3D"mailto:yypvsxf19870706@gmail.com" target=3D=
"_blank">yypvsxf19870706@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">s</blockquote></div><br><br></div></div>

--002354471084c674b404d7f42a0f--