Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "David Parks" <davidparks21@yahoo.com>
To: <user@hadoop.apache.org>
References: <088601ce0679$2e56c110$8b044330$@yahoo.com>
 <CAOcnVr3Tj1Dq=P-CybCa0pFUfzFE6PamJ5_ovmYsRULw1GoY_A@mail.gmail.com>
 <BLU0-SMTP1990F0DD02F683F098D36F18F0A0@phx.gbl>
In-Reply-To: <BLU0-SMTP1990F0DD02F683F098D36F18F0A0@phx.gbl>
Subject: RE: How can I limit reducers to one-per-node?
Date: Mon, 11 Feb 2013 09:29:39 +0700
Message-ID: <0a5f01ce07ff$a59a2280$f0ce6780$@yahoo.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0A60_01CE083A.51FA3300"
thread-index: AQJTcGUtpoQ37VYnZUxvv59NqhcA+gJVkOSmAei9gzKXRyu14A==
Content-Language: en-us

This is a multipart message in MIME format.

------=_NextPart_000_0A60_01CE083A.51FA3300
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

I guess the FairScheduler is doing multiple assignments per heartbeat, hence
the behavior of multiple reduce tasks per node even when they should
otherwise be full distributed. 

 
Adding a combiner will change this behavior? Could you explain more?

 
Thanks!

David

 
From: Michael Segel [mailto:michael_segel@hotmail.com] 
Sent: Monday, February 11, 2013 8:30 AM
To: user@hadoop.apache.org
Subject: Re: How can I limit reducers to one-per-node?

 
Adding a combiner step first then reduce? 

 
On Feb 8, 2013, at 11:18 PM, Harsh J <harsh@cloudera.com> wrote:


Hey David,

There's no readily available way to do this today (you may be
interested in MAPREDUCE-199 though) but if your Job scheduler's not
doing multiple-assignments on reduce tasks, then only one is assigned
per TT heartbeat, which gives you almost what you're looking for: 1
reduce task per node, round-robin'd (roughly).

On Sat, Feb 9, 2013 at 9:24 AM, David Parks <davidparks21@yahoo.com> wrote:


I have a cluster of boxes with 3 reducers per node. I want to limit a
particular job to only run 1 reducer per node.


This job is network IO bound, gathering images from a set of webservers.


My job has certain parameters set to meet "web politeness" standards (e.g.
limit connects and connection frequency).


If this job runs from multiple reducers on the same node, those per-host
limits will be violated.  Also, this is a shared environment and I don't
want long running network bound jobs uselessly taking up all reduce slots.


--
Harsh J

 
Michael Segel <mailto:msegel@segel.com>   | (m) 312.755.9623

Segel and Associates

 
------=_NextPart_000_0A60_01CE083A.51FA3300
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><META =
HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii"><meta name=3DGenerator content=3D"Microsoft Word 14 =
(filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.apple-converted-space
	{mso-style-name:apple-converted-space;}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-US link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>I guess the FairScheduler is doing multiple assignments per =
heartbeat, hence the behavior of multiple reduce tasks per node even =
when they should otherwise be full distributed. <o:p></o:p></span></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>Adding a combiner will change this behavior? Could you explain =
more?<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>Thanks!<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>David<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><div><div =
style=3D'border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in =
0in 0in'><p class=3DMsoNormal><b><span =
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span>=
</b><span style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'> =
Michael Segel [mailto:michael_segel@hotmail.com] <br><b>Sent:</b> =
Monday, February 11, 2013 8:30 AM<br><b>To:</b> =
user@hadoop.apache.org<br><b>Subject:</b> Re: How can I limit reducers =
to one-per-node?<o:p></o:p></span></p></div></div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p class=3DMsoNormal>Adding a =
combiner step first then reduce?&nbsp;<o:p></o:p></p><div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div><div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><div><div><p class=3DMsoNormal>On =
Feb 8, 2013, at 11:18 PM, Harsh J &lt;<a =
href=3D"mailto:harsh@cloudera.com">harsh@cloudera.com</a>&gt; =
wrote:<o:p></o:p></p></div><p =
class=3DMsoNormal><br><br><o:p></o:p></p><p class=3DMsoNormal>Hey =
David,<br><br>There's no readily available way to do this today (you may =
be<br>interested in MAPREDUCE-199 though) but if your Job scheduler's =
not<br>doing multiple-assignments on reduce tasks, then only one is =
assigned<br>per TT heartbeat, which gives you almost what you're looking =
for: 1<br>reduce task per node, round-robin'd (roughly).<br><br>On Sat, =
Feb 9, 2013 at 9:24 AM, David Parks &lt;<a =
href=3D"mailto:davidparks21@yahoo.com">davidparks21@yahoo.com</a>&gt; =
wrote:<br><br><o:p></o:p></p><p class=3DMsoNormal>I have a cluster of =
boxes with 3 reducers per node. I want to limit a<br>particular job to =
only run 1 reducer per node.<br><br><br><br>This job is network IO =
bound, gathering images from a set of webservers.<br><br><br><br>My job =
has certain parameters set to meet &#8220;web politeness&#8221; =
standards (e.g.<br>limit connects and connection =
frequency).<br><br><br><br>If this job runs from multiple reducers on =
the same node, those per-host<br>limits will be violated. &nbsp;Also, =
this is a shared environment and I don&#8217;t<br>want long running =
network bound jobs uselessly taking up all reduce =
slots.<o:p></o:p></p><p class=3DMsoNormal =
style=3D'margin-bottom:12.0pt'><br><br><br>--<br>Harsh =
J<o:p></o:p></p></div><p class=3DMsoNormal><o:p>&nbsp;</o:p></p><div><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:9.0pt;font-family:"Arial","sans-serif"'><a =
href=3D"mailto:msegel@segel.com">Michael Segel&nbsp;</a></span><span =
class=3Dapple-converted-space>&nbsp;</span><span =
style=3D'font-size:13.0pt;font-family:"Arial","sans-serif"'>| (m) =
312.755.9623</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:13.0pt;font-family:"Arial","sans-serif"'>Segel and =
Associates</span><o:p></o:p></p></div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div></div></body></html>
------=_NextPart_000_0A60_01CE083A.51FA3300--