Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of java8964@hotmail.com
 designates 65.55.90.101 as permitted sender)
Message-ID: <SNT149-W40DC0A888B8AB256BC9839D0DC0@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_17bebd23-2797-4399-b85a-121514dd4014_"
From: java8964 <java8964@hotmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: RE: issue about Shuffled Maps in MR job summary
Date: Thu, 12 Dec 2013 10:16:51 -0500
Importance: Normal
In-Reply-To: <SNT149-W30F0CFE3C36D6402CECD9FD0DC0@phx.gbl>
References: 
 <CACeqxwRAfMysGxSeH=GxC-EvXfLTo+7AdaJ9zRFbk1DzJccCNg@mail.gmail.com>,<5DF48A23D7B14649BBA72C2F64C6663B82B356DB@szxeml523-mbx.china.huawei.com>,<CACeqxwRcbBDtKQ_vPYzghaDjxYi3QGH_54__Xc1oaNZNcUhvPQ@mail.gmail.com>,<SNT149-W36729B58BE3D39C1F5C1C0D0DD0@phx.gbl>,<CACeqxwREtDOFc1xCOm5qyqBruoKXGnmczys-oHF-1knjmQAoTQ@mail.gmail.com>,<SNT149-W93300F0E79B8461E8AFFACD0DC0@phx.gbl>,<CACeqxwQfCw7W=aXPSK-0x80vn9Lj3+KtUCBcq4Hjn3-vZ-v9RQ@mail.gmail.com>,<SNT149-W30F0CFE3C36D6402CECD9FD0DC0@phx.gbl>
MIME-Version: 1.0

--_17bebd23-2797-4399-b85a-121514dd4014_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Or you should check  your job history UI=2C which provide the similar infor=
mation as job tracker=2C as you are using MR2 and Yarn.
The default port of job history UI is 19888.

From: java8964@hotmail.com
To: user@hadoop.apache.org
Subject: RE: issue about Shuffled Maps in MR job summary
Date: Thu=2C 12 Dec 2013 10:06:37 -0500

=0A=
=0A=
=0A=
Then you can check your job's status from the yarn resource manager web ui=
=2C to identify what step your reducers are in.

Date: Thu=2C 12 Dec 2013 11:12:47 +0800
Subject: Re: issue about Shuffled Maps in MR job summary
From: justlooks@gmail.com
To: user@hadoop.apache.org

one of important things is my input file is very small =2Ceach file less th=
an 10M=2Cand i have a huge number of files

=0A=
On Thu=2C Dec 12=2C 2013 at 9:58 AM=2C java8964 <java8964@hotmail.com> wrot=
e:
=0A=
=0A=
=0A=
Assume the block size is 128M=2C and your mapper each finishes within half =
minute=2C then there is not too much logic in your mapper=2C as it can fini=
sh processing 128M around 30 seconds. If your reducers cannot finish with 1=
 week=2C then something is wrong. =0A=

=0A=
So you may need to find out following:=0A=

=0A=
1) How many mappers generated in your MR job?=0A=
2) Are they all finished? (Check them in the jobtracker through web or comm=
and line)=0A=
3) How many reducers in this job?=0A=
4) Are reducers starting? What stage are they in? Copying/Sorting/Reducing?=
=0A=
5) If in the reducing stage=2C check the userlog of reducers. Is your code =
running now? =0A=

=0A=
All these information you can find out from the Job Tracker web UI.=0A=

=0A=
Yong

=0A=
=0A=
=0A=
Date: Thu=2C 12 Dec 2013 09:03:29 +0800 =0A=
=0A=

Subject: Re: issue about Shuffled Maps in MR job summary
From: justlooks@gmail.com
To: user@hadoop.apache.org
=0A=

=0A=
hi=2C=0A=
    suppose i have 5-worknode cluster=2Ceach worknode can allocate 40G mem =
=2Cand i do not care map task=2Cbe cause the map task in my job finished wi=
thin half a minuter=2Cas my observe the real slow task is reduce=2C i alloc=
ate 12G to each reduce task=2Cso each worknode can support 3 reduce paralle=
l=2Cand the whole cluster can support 15 reducer=2Cand i run the job with a=
ll 15 reducer=2C and i do not know if i increase reducer number from 15 to =
30 =2Ceach reduce allocate 6G MEM=2Cthat will speed the job or not =2Cthe j=
ob run on my product env=2C it run nearly 1 week=2Cit still not finished
=0A=

=0A=
On Wed=2C Dec 11=2C 2013 at 9:50 PM=2C java8964 <java8964@hotmail.com> wrot=
e:
=0A=
=0A=
=0A=
The whole job complete time depends on a lot of factors. Are you sure the r=
educers part is the bottleneck? =0A=

=0A=
Also=2C it also depends on how many Reducer input groups it has in your MR =
job. If you only have 20 reducer groups=2C even you jump your reducer count=
 to 40=2C then the epoch of reducers part won't have too much change=2C as =
the additional 20 reducer task won't get data to process.=0A=
=0A=

=0A=
If you have a lot of reducer input groups=2C and your cluster does have cap=
acity at this time=2C and your also have a lot idle reducer slot=2C then in=
crease your reducer count should decrease your whole job complete time.=0A=
=0A=

=0A=
Make sense?=0A=

=0A=
Yong

=0A=
=0A=
=0A=
Date: Wed=2C 11 Dec 2013 14:20:24 +0800
Subject: Re: issue about Shuffled Maps in MR job summary
From: justlooks@gmail.com
To: user@hadoop.apache.org =0A=
=0A=


=0A=
i read the doc=2C and find if i have 8 reducer =2Ca map task will output 8 =
partition =2Ceach partition will be send to a different reducer=2C so if i =
increase reduce number =2Cthe partition number increase =2Cbut the volume o=
n network traffic is same=2Cwhy sometime =2Cincrease reducer number will no=
t decrease job complete time ?=0A=
=0A=
 =0A=
On Wed=2C Dec 11=2C 2013 at 1:48 PM=2C Vinayakumar B <vinayakumar.b@huawei.=
com> wrote:
=0A=
=0A=
=0A=
It looks simple=2C J
=0A=
=20
Shuffled Maps=3D Number of Map Tasks * Number of Reducers
=0A=
=20
Thanks and Regards=2C
=0A=
Vinayakumar B
=20
=0A=
=0A=
From: ch huang [mailto:justlooks@gmail.com]=20
=0A=
Sent: 11 December 2013 10:56
To: user@hadoop.apache.org
Subject: issue about Shuffled Maps in MR job summary
=0A=
=0A=
=0A=
=20
=0A=
hi=2Cmaillist:
=0A=
           i run terasort with 16 reducers and 8 reducers=2Cwhen i double r=
educer number=2C the Shuffled maps is also double =2Cmy question is the job=
 only run 20 map tasks (total input file is 10=2Cand each file is 100M=2Cmy=
 block size is 64M=2Cso split is 20) why i need shuffle 160 maps in 8 reduc=
ers run and 320 maps in 16 reducers run?how to caculate the shuffle maps nu=
mber?
=0A=
=0A=
=20
=0A=
16 reducer summary output:
=0A=
=20
=0A=
   =20
=0A=
 Shuffled Maps =3D320
=0A=
=20
=0A=
=0A=
8 reducer summary output:
=0A=
 =20
=0A=
Shuffled Maps =3D160

=0A=


 		 	   		   		 	   		  =

--_17bebd23-2797-4399-b85a-121514dd4014_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 12pt=3B
font-family:Calibri
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>Or you should check &nbsp=3Byour=
 job history UI=2C which provide the similar information as job tracker=2C =
as you are using MR2 and Yarn.<div><br></div><div>The default port of job h=
istory UI is 19888.<br><br><div><hr id=3D"stopSpelling">From: java8964@hotm=
ail.com<br>To: user@hadoop.apache.org<br>Subject: RE: issue about Shuffled =
Maps in MR job summary<br>Date: Thu=2C 12 Dec 2013 10:06:37 -0500<br><br>=
=0A=
=0A=
<style><!--=0A=
.ExternalClass .ecxhmmessage P {=0A=
padding:0px=3B=0A=
}=0A=
=0A=
.ExternalClass body.ecxhmmessage {=0A=
font-size:12pt=3B=0A=
font-family:Calibri=3B=0A=
}=0A=
=0A=
--></style>=0A=
<div dir=3D"ltr">Then you can check your job's status from the yarn resourc=
e manager web ui=2C to identify what step your reducers are in.<br><br><div=
><hr id=3D"ecxstopSpelling">Date: Thu=2C 12 Dec 2013 11:12:47 +0800<br>Subj=
ect: Re: issue about Shuffled Maps in MR job summary<br>From: justlooks@gma=
il.com<br>To: user@hadoop.apache.org<br><br>one of important things is my i=
nput file is very small =2Ceach file less than 10M=2Cand i have a huge numb=
er of files<br><br>=0A=
<div class=3D"ecxgmail_quote">On Thu=2C Dec 12=2C 2013 at 9:58 AM=2C java89=
64 <span dir=3D"ltr">&lt=3B<a href=3D"mailto:java8964@hotmail.com" target=
=3D"_blank">java8964@hotmail.com</a>&gt=3B</span> wrote:<br>=0A=
<blockquote style=3D"BORDER-LEFT:#ccc 1px solid=3BMARGIN:0px 0px 0px 0.8ex=
=3BPADDING-LEFT:1ex=3B" class=3D"ecxgmail_quote">=0A=
<div>=0A=
<div dir=3D"ltr">Assume the block size is 128M=2C and your mapper each fini=
shes within half minute=2C then there is not too much logic in your mapper=
=2C as it can finish processing 128M around 30 seconds. If your reducers ca=
nnot finish with 1 week=2C then something is wrong. =0A=
<div><br></div>=0A=
<div>So you may need to find out following:</div>=0A=
<div><br></div>=0A=
<div>1) How many mappers generated in your MR job?</div>=0A=
<div>2) Are they all finished? (Check them in the jobtracker through web or=
 command line)</div>=0A=
<div>3) How many reducers in this job?</div>=0A=
<div>4) Are reducers starting? What stage are they in? Copying/Sorting/Redu=
cing?</div>=0A=
<div>5) If in the reducing stage=2C check the userlog of reducers. Is your =
code running now?&nbsp=3B</div>=0A=
<div><br></div>=0A=
<div>All these information you can find out from the Job Tracker web UI.</d=
iv>=0A=
<div><br></div>=0A=
<div>Yong<br><br>=0A=
<div>=0A=
<hr>=0A=
Date: Thu=2C 12 Dec 2013 09:03:29 +0800 =0A=
<div>=0A=
<div class=3D"h5"><br>Subject: Re: issue about Shuffled Maps in MR job summ=
ary<br>From: <a href=3D"mailto:justlooks@gmail.com" target=3D"_blank">justl=
ooks@gmail.com</a><br>To: <a href=3D"mailto:user@hadoop.apache.org" target=
=3D"_blank">user@hadoop.apache.org</a><br>=0A=
<br>=0A=
<div>hi=2C</div>=0A=
<div>&nbsp=3B&nbsp=3B&nbsp=3B suppose i have 5-worknode cluster=2Ceach work=
node can allocate 40G mem =2Cand i do not care map task=2Cbe cause the map =
task in my job&nbsp=3Bfinished within half a minuter=2Cas my observe the re=
al slow task is reduce=2C i allocate 12G to each reduce task=2Cso each work=
node can support 3 reduce parallel=2Cand the whole cluster can support 15 r=
educer=2Cand i run the job with all 15 reducer=2C and i do not know if i in=
crease reducer number from 15 to 30 =2Ceach reduce allocate 6G MEM=2Cthat w=
ill speed the job or not =2Cthe job run on my product env=2C it run nearly =
1 week=2Cit still not finished<br>=0A=
<br></div>=0A=
<div>On Wed=2C Dec 11=2C 2013 at 9:50 PM=2C java8964 <span dir=3D"ltr">&lt=
=3B<a href=3D"mailto:java8964@hotmail.com" target=3D"_blank">java8964@hotma=
il.com</a>&gt=3B</span> wrote:<br>=0A=
<blockquote style=3D"BORDER-LEFT:#ccc 1px solid=3BMARGIN:0px 0px 0px 0.8ex=
=3BPADDING-LEFT:1ex=3B">=0A=
<div>=0A=
<div dir=3D"ltr">The whole job complete time depends on a lot of factors. A=
re you sure the reducers part is the bottleneck? =0A=
<div><br></div>=0A=
<div>Also=2C it also depends on how many Reducer input groups it has in you=
r MR job. If you only have 20 reducer groups=2C even you jump your reducer =
count to 40=2C then the epoch of reducers part won't have too much change=
=2C as the additional 20 reducer task won't get data to process.</div>=0A=
=0A=
<div><br></div>=0A=
<div>If you have a lot of reducer input groups=2C and your cluster does hav=
e capacity at this time=2C and your also have a lot idle reducer slot=2C th=
en increase your reducer count should decrease your whole job complete time=
.</div>=0A=
=0A=
<div><br></div>=0A=
<div>Make sense?</div>=0A=
<div><br></div>=0A=
<div>Yong<br><br>=0A=
<div>=0A=
<hr>=0A=
Date: Wed=2C 11 Dec 2013 14:20:24 +0800<br>Subject: Re: issue about Shuffle=
d Maps in MR job summary<br>From: <a href=3D"mailto:justlooks@gmail.com" ta=
rget=3D"_blank">justlooks@gmail.com</a><br>To: <a href=3D"mailto:user@hadoo=
p.apache.org" target=3D"_blank">user@hadoop.apache.org</a> =0A=
<div>=0A=
<div><br><br>=0A=
<div>i read the doc=2C and find if i have 8 reducer =2Ca map task will outp=
ut 8 partition =2Ceach partition will be&nbsp=3Bsend to a different reducer=
=2C so if i increase reduce number =2Cthe partition number increase =2Cbut =
the&nbsp=3Bvolume on network traffic is same=2Cwhy sometime =2Cincrease red=
ucer number will not decrease job&nbsp=3Bcomplete time&nbsp=3B?</div>=0A=
=0A=
<div>&nbsp=3B</div>=0A=
<div>On Wed=2C Dec 11=2C 2013 at 1:48 PM=2C Vinayakumar B <span dir=3D"ltr"=
>&lt=3B<a href=3D"mailto:vinayakumar.b@huawei.com" target=3D"_blank">vinaya=
kumar.b@huawei.com</a>&gt=3B</span> wrote:<br>=0A=
<blockquote style=3D"BORDER-LEFT:#ccc 1px solid=3BMARGIN:0px 0px 0px 0.8ex=
=3BPADDING-LEFT:1ex=3B">=0A=
<div lang=3D"EN-US">=0A=
<div><span style=3D"FONT-FAMILY:'Calibri'=2C'sans-serif'=3BCOLOR:#1f497d=3B=
FONT-SIZE:11pt=3B">It looks simple=2C </span><span style=3D"FONT-FAMILY:Win=
gdings=3BCOLOR:#1f497d=3BFONT-SIZE:11pt=3B">J</span><span style=3D"FONT-FAM=
ILY:'Calibri'=2C'sans-serif'=3BCOLOR:#1f497d=3BFONT-SIZE:11pt=3B"><u></u><u=
></u></span><br>=0A=
<span style=3D"FONT-FAMILY:'Calibri'=2C'sans-serif'=3BCOLOR:#1f497d=3BFONT-=
SIZE:11pt=3B"><u></u>&nbsp=3B<u></u></span><br><span style=3D"FONT-FAMILY:'=
Tahoma'=2C'sans-serif'=3BFONT-SIZE:13.5pt=3B">Shuffled Maps=3D Number of Ma=
p Tasks * Number of Reducers</span><span style=3D"FONT-FAMILY:'Calibri'=2C'=
sans-serif'=3BCOLOR:#1f497d=3BFONT-SIZE:11pt=3B"><u></u><u></u></span><br>=
=0A=
<span style=3D"FONT-FAMILY:'Calibri'=2C'sans-serif'=3BCOLOR:#1f497d=3BFONT-=
SIZE:11pt=3B"><u></u>&nbsp=3B<u></u></span><br><span style=3D"FONT-FAMILY:'=
Calibri'=2C'sans-serif'=3BCOLOR:#1f497d=3BFONT-SIZE:11pt=3B">Thanks and Reg=
ards=2C<u></u><u></u></span><br>=0A=
<span style=3D"FONT-FAMILY:'Calibri'=2C'sans-serif'=3BCOLOR:#1f497d=3BFONT-=
SIZE:11pt=3B">Vinayakumar B<u></u><u></u></span><br><span style=3D"FONT-FAM=
ILY:'Calibri'=2C'sans-serif'=3BCOLOR:#1f497d=3BFONT-SIZE:11pt=3B"><u></u>&n=
bsp=3B<u></u></span><br>=0A=
=0A=
<div style=3D"BORDER-BOTTOM:medium none=3BBORDER-LEFT:medium none=3BPADDING=
-BOTTOM:0cm=3BPADDING-LEFT:0cm=3BPADDING-RIGHT:0cm=3BBORDER-TOP:#b5c4df 1pt=
 solid=3BBORDER-RIGHT:medium none=3BPADDING-TOP:3pt=3B"><b><span style=3D"F=
ONT-FAMILY:'Tahoma'=2C'sans-serif'=3BFONT-SIZE:10pt=3B">From:</span></b><sp=
an style=3D"FONT-FAMILY:'Tahoma'=2C'sans-serif'=3BFONT-SIZE:10pt=3B"> ch hu=
ang [mailto:<a href=3D"mailto:justlooks@gmail.com" target=3D"_blank">justlo=
oks@gmail.com</a>] <br>=0A=
<b>Sent:</b> 11 December 2013 10:56<br><b>To:</b> <a href=3D"mailto:user@ha=
doop.apache.org" target=3D"_blank">user@hadoop.apache.org</a><br><b>Subject=
:</b> issue about Shuffled Maps in MR job summary<u></u><u></u></span><br><=
/div>=0A=
=0A=
<div>=0A=
<div><u></u>&nbsp=3B<u></u><br>=0A=
<div>hi=2Cmaillist:<u></u><u></u><br></div>=0A=
<div>&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=
=3B&nbsp=3B i run terasort with 16 reducers and 8 reducers=2Cwhen i&nbsp=3B=
double reducer number=2C the Shuffled maps is also double =2Cmy question is=
 the job only run 20 map tasks (total input file is 10=2Cand each file is 1=
00M=2Cmy block size is 64M=2Cso split is 20) why i need shuffle 160 maps in=
 8 reducers run and 320 maps in 16 reducers run?how to caculate the shuffle=
 maps number?<u></u><u></u><br>=0A=
</div>=0A=
<div>&nbsp=3B<u></u><u></u><br></div>=0A=
<div>16 reducer summary output:<u></u><u></u><br></div>=0A=
<div>&nbsp=3B<u></u><u></u><br></div>=0A=
<div>&nbsp=3B&nbsp=3B&nbsp=3B <u></u><u></u><br>=0A=
<div><span style=3D"FONT-FAMILY:'Tahoma'=2C'sans-serif'=3BFONT-SIZE:13.5pt=
=3B">&nbsp=3BShuffled Maps =3D320</span><u></u><u></u><br></div>=0A=
<div>&nbsp=3B<u></u><u></u><br></div>=0A=
<div>=0A=
<div><span style=3D"FONT-FAMILY:'Tahoma'=2C'sans-serif'=3BFONT-SIZE:13.5pt=
=3B">8&nbsp=3Breducer summary output:<u></u><u></u></span><br></div>=0A=
<div><span style=3D"FONT-FAMILY:'Tahoma'=2C'sans-serif'=3BFONT-SIZE:13.5pt=
=3B">&nbsp=3B <u></u><u></u></span><br>=0A=
<div><span style=3D"FONT-FAMILY:'Tahoma'=2C'sans-serif'=3BFONT-SIZE:13.5pt=
=3B">Shuffled Maps =3D160<u></u><u></u></span><br></div></div></div></div><=
/div></div></div></div></blockquote></div><br></div></div></div></div>=0A=
</div></div></blockquote></div><br></div></div></div></div></div></div></bl=
ockquote></div><br></div> 		 	   		  </div></div></div> 		 	   		  </div></=
body>
</html>=

--_17bebd23-2797-4399-b85a-121514dd4014_--