Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of harsh@cloudera.com designates
 209.85.223.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <941970C0-69AA-4928-A8E6-B9F72FD8917D@keithwiley.com>
References: <280A85F8-DBB1-483D-843D-0D99C3356689@keithwiley.com>
	<CAOcnVr2AxyLZ7N6qKfvOmzYU4HHQB95327QrnPrE+ewN=rsDuQ@mail.gmail.com>
	<941970C0-69AA-4928-A8E6-B9F72FD8917D@keithwiley.com>
Date: Sat, 1 Feb 2014 06:16:39 +0530
Message-ID: 
 <CAOcnVr3DuZF-X7oyeZNyAOasx8EhbO9-N=WhiJtY57wN4THjNA@mail.gmail.com>
Subject: Re: Force one mapper per machine (not core)?
From: Harsh J <harsh@cloudera.com>
To: "<user@hadoop.apache.org>" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=90e6ba6149b2a3a1a504f14d9fb8

--90e6ba6149b2a3a1a504f14d9fb8
Content-Type: text/plain; charset=ISO-8859-1

If it's job tracker you use, it's MR1.
On Feb 1, 2014 12:23 AM, "Keith Wiley" <kwiley@keithwiley.com> wrote:

> Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was
> specifically configured with MR1 or MR2 (is there a distinction between MR2
> and Yarn?) I'm not absolutely certain.  I know that the cluster "behaves"
> like the MR1 clusters I've worked with for years (I interact with the job
> tracker in a classical way for example).  Can I tell whether it's MR1 or
> MR2 from the job tracker or namename web UIs?
>
> Thanks.
>
> On Jan 29, 2014, at 00:52 , Harsh J wrote:
>
> > Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> > would allow you to do this if you used appropriate memory based
> > requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> > (depending on the YARN scheduler resource request limits config) you
> > can request your job be run with the maximum-most requests that would
> > soak up all provided resources (of CPU and Memory) of a node such that
> > only one container runs on a host at any given time.
> >
> > On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley <kwiley@keithwiley.com>
> wrote:
> >> I'm running a program which in the streaming layer automatically
> multithreads and does so by automatically detecting the number of cores on
> the machine.  I realize this model is somewhat in conflict with Hadoop, but
> nonetheless, that's what I'm doing.  Thus, for even resource utilization,
> it would be nice to not only assign one mapper per core, but only one
> mapper per machine.  I realize that if I saturate the cluster none of this
> really matters, but consider the following example for clarity: 4-core
> nodes, 10-node cluster, thus 40 slots, fully configured across mappers and
> reducers (40 slots of each).  Say I run this program with just two mappers.
>  It would run much more efficiently (in essentially half the time) if I
> could force the two mappers to go to slots on two separate machines instead
> of running the risk that Hadoop may assign them both to the same machine.
> >>
> >> Can this be done?
> >>
> >> Thanks.
>
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>                                            --  Abe (Grandpa) Simpson
>
> ________________________________________________________________________________
>
>

--90e6ba6149b2a3a1a504f14d9fb8
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">If it&#39;s job tracker you use, it&#39;s MR1.</p>
<div class=3D"gmail_quote">On Feb 1, 2014 12:23 AM, &quot;Keith Wiley&quot;=
 &lt;<a href=3D"mailto:kwiley@keithwiley.com">kwiley@keithwiley.com</a>&gt;=
 wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hmmm, okay. =A0I know it&#39;s running CDH4 4.4.0, as but for whether it wa=
s specifically configured with MR1 or MR2 (is there a distinction between M=
R2 and Yarn?) I&#39;m not absolutely certain. =A0I know that the cluster &q=
uot;behaves&quot; like the MR1 clusters I&#39;ve worked with for years (I i=
nteract with the job tracker in a classical way for example). =A0Can I tell=
 whether it&#39;s MR1 or MR2 from the job tracker or namename web UIs?<br>

<br>
Thanks.<br>
<br>
On Jan 29, 2014, at 00:52 , Harsh J wrote:<br>
<br>
&gt; Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler<br>
&gt; would allow you to do this if you used appropriate memory based<br>
&gt; requests (see <a href=3D"http://search-hadoop.com/m/gnFs91yIg1e" targe=
t=3D"_blank">http://search-hadoop.com/m/gnFs91yIg1e</a>), and on MR2<br>
&gt; (depending on the YARN scheduler resource request limits config) you<b=
r>
&gt; can request your job be run with the maximum-most requests that would<=
br>
&gt; soak up all provided resources (of CPU and Memory) of a node such that=
<br>
&gt; only one container runs on a host at any given time.<br>
&gt;<br>
&gt; On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley &lt;<a href=3D"mailto:kwi=
ley@keithwiley.com">kwiley@keithwiley.com</a>&gt; wrote:<br>
&gt;&gt; I&#39;m running a program which in the streaming layer automatical=
ly multithreads and does so by automatically detecting the number of cores =
on the machine. =A0I realize this model is somewhat in conflict with Hadoop=
, but nonetheless, that&#39;s what I&#39;m doing. =A0Thus, for even resourc=
e utilization, it would be nice to not only assign one mapper per core, but=
 only one mapper per machine. =A0I realize that if I saturate the cluster n=
one of this really matters, but consider the following example for clarity:=
 4-core nodes, 10-node cluster, thus 40 slots, fully configured across mapp=
ers and reducers (40 slots of each). =A0Say I run this program with just tw=
o mappers. =A0It would run much more efficiently (in essentially half the t=
ime) if I could force the two mappers to go to slots on two separate machin=
es instead of running the risk that Hadoop may assign them both to the same=
 machine.<br>

&gt;&gt;<br>
&gt;&gt; Can this be done?<br>
&gt;&gt;<br>
&gt;&gt; Thanks.<br>
<br>
<br>
___________________________________________________________________________=
_____<br>
Keith Wiley =A0 =A0 <a href=3D"mailto:kwiley@keithwiley.com">kwiley@keithwi=
ley.com</a> =A0 =A0 <a href=3D"http://keithwiley.com" target=3D"_blank">kei=
thwiley.com</a> =A0 =A0<a href=3D"http://music.keithwiley.com" target=3D"_b=
lank">music.keithwiley.com</a><br>

<br>
&quot;I used to be with it, but then they changed what it was. =A0Now, what=
 I&#39;m with<br>
isn&#39;t it, and what&#39;s it seems weird and scary to me.&quot;<br>
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0-- =A0Abe (Grandpa) Simpson<br>
___________________________________________________________________________=
_____<br>
<br>
</blockquote></div>

--90e6ba6149b2a3a1a504f14d9fb8--