Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: 216.145.54.172 is neither permitted
 nor denied by domain of evans@yahoo-inc.com)
From: Robert Evans <evans@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>
Date: Thu, 10 May 2012 08:29:55 -0500
Subject: Re: max 1 mapper per node
Thread-Topic: max 1 mapper per node
Thread-Index: Ac0upBSzXntP33cXQmWHjUlV9/ZGVAADOfc7
Message-ID: <CBD12D03.3A231%evans@yahoo-inc.com>
In-Reply-To: <4FABACE3.9010203@filez.com>
Accept-Language: en-US
Content-Language: en
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_CBD12D033A231evansyahooinccom_"
MIME-Version: 1.0

--_000_CBD12D033A231evansyahooinccom_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Yes adding in more resources in the scheduling request would be the ideal s=
olution to the problem.  But sadly that is not a trivial change.  The initi=
al solution I suggested is an ugly hack, and will not work for the cases yo=
u have suggested.  If you feel that this is important work please feel free=
 to file a JIRA for this.  We can continue discussion on that JIRA about  t=
he details of how to add in this type of functionality.  I am very interest=
ed in the scheduler and would be happy to help out, but sadly my time right=
 now is very limited.

--Bobby Evans

On 5/10/12 6:56 AM, "Radim Kolar" <hsn@filez.com> wrote:


> We've been against these 'features' since it leads to very bad
> behaviour across the cluster with multiple apps/users etc.
Its not new feature, its extension of existing resource scheduling which
works good enough only for RAM. There are 2 other resources - CPU cores
and network IO which needs to be considered.

We have job which is doing lot of network IO in mapper and its desirable
to run mappers on different nodes even if reading blocks from HDFS will
not be local.

Our second job is burning all CPU cores on machine while doing
computations, its important for mappers not to land on same node.


--_000_CBD12D033A231evansyahooinccom_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML>
<HEAD>
<TITLE>Re: max 1 mapper per node</TITLE>
</HEAD>
<BODY>
<FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"><SPAN STYLE=3D'font-size:=
11pt'>Yes adding in more resources in the scheduling request would be the i=
deal solution to the problem. &nbsp;But sadly that is not a trivial change.=
 &nbsp;The initial solution I suggested is an ugly hack, and will not work =
for the cases you have suggested. &nbsp;If you feel that this is important =
work please feel free to file a JIRA for this. &nbsp;We can continue discus=
sion on that JIRA about &nbsp;the details of how to add in this type of fun=
ctionality. &nbsp;I am very interested in the scheduler and would be happy =
to help out, but sadly my time right now is very limited.<BR>
<BR>
--Bobby Evans<BR>
<BR>
On 5/10/12 6:56 AM, &quot;Radim Kolar&quot; &lt;<a href=3D"hsn@filez.com">h=
sn@filez.com</a>&gt; wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE=3D"Calibri, Verdana, Helvetica, Arial"=
><SPAN STYLE=3D'font-size:11pt'><BR>
<BR>
&gt; We've been against these 'features' since it leads to very bad<BR>
&gt; behaviour across the cluster with multiple apps/users etc.<BR>
Its not new feature, its extension of existing resource scheduling which<BR=
>
works good enough only for RAM. There are 2 other resources - CPU cores<BR>
and network IO which needs to be considered.<BR>
<BR>
We have job which is doing lot of network IO in mapper and its desirable<BR=
>
to run mappers on different nodes even if reading blocks from HDFS will<BR>
not be local.<BR>
<BR>
Our second job is burning all CPU cores on machine while doing<BR>
computations, its important for mappers not to land on same node.<BR>
<BR>
</SPAN></FONT></BLOCKQUOTE>
</BODY>
</HTML>


--_000_CBD12D033A231evansyahooinccom_--