Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of nitinpawar432@gmail.com
 designates 209.85.160.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <AE2103CC-A3B5-49BD-B705-93FF8B9FCB39@gmail.com>
References: 
 <CAJp7=sHTgJt59o3uyQ9kV9VBDPQyu90qP+op28A6E72YH+0ZWw@mail.gmail.com>
	<CAOcnVr1ZGrhFpdCHL160H_Qp_aqKMm3c4XG=wGoFYYNJOW2fyA@mail.gmail.com>
	<AE2103CC-A3B5-49BD-B705-93FF8B9FCB39@gmail.com>
Date: Thu, 31 Jul 2014 18:41:31 +0530
Message-ID: 
 <CAORpBsjfTNRLYfwitBA8svUzLNuwFPjrjJUZsKD+1J=CL-zAWw@mail.gmail.com>
Subject: Re: Performance on singlenode and multinode hadoop
From: Nitin Pawar <nitinpawar432@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=20cf3005ddc8edc8ea04ff7d0207

--20cf3005ddc8edc8ea04ff7d0207
Content-Type: text/plain; charset=UTF-8

what kind of jobs your tasks will be doing?
are they CPU intensive or only memory intensive ?


On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <sindhuht@gmail.com> wrote:

> Hello ,
>
> If i am running my experiment on a server with 2 processors (4 cores each
> ) .
> To say it has 2 processors and 8 cores .
> What would be the ideal values for mapred.tasktracker.map.tasks.maximum
>  and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
> Your help is very much appreciated.
>
>
> Regards,
> Sindhu
>
>
> On 29 Jul 2014, at 18:56, Harsh J <harsh@cloudera.com> wrote:
>
> > It isn't the DataNode that does the compute spawn/work, but the
> TaskTracker.
> >
> > If you wanted to increase MR parallelism on a single machine, you do
> > not need two DNs, nor two TTs, just higher slot capacities in your
> > TT's mapred-site.xml via properties
> > mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.
> >
> > On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane <sindhuht@gmail.com>
> wrote:
> >> Hello ,
> >>
> >> i set up 2 datanodes on a single machine(ubuntu machine)  accordingly
> >> mentioned in the thread
> >>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16.exchange.ms%3E
> >>
> >> Ubuntu machine has 2 processors and 8 cores. Assuming that machine is
> >> powerful , i Setup 2 datanodes on that same machine.
> >>
> >> Now when i run jps on that multinode hadoop , i get
> >> Namenode
> >> Datanode
> >> Datanode
> >> Jobtracker
> >> Tasktracker
> >> Secondary Namenode
> >>
> >> The above result Shows 2 datanodes are up and running
> >>
> >> Also i have a single node on that ubuntu machine as well.
> >> Now when i check Performance on singlenode and multinode , both are
> almost
> >> same.So now ,
> >> How do i make sure load is being distributed on both datanodes or each
> >> datanode uses different cores of the ubuntu machine.
> >>
> >> (Note: i know multiple datanodes on same machine is not that
> advantageous ,
> >> but assuming my machine is powerful ..i set it up..)
> >>
> >> would appreciate any advices on this.
> >>
> >> Regards,
> >> Sindhu
> >
> >
> >
> > --
> > Harsh J
>
>


-- 
Nitin Pawar

--20cf3005ddc8edc8ea04ff7d0207
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><font color=3D"#333333" face=3D"Arial, sans-serif"><span s=
tyle=3D"font-size:14px;line-height:20px">what kind of jobs your tasks will =
be doing?=C2=A0</span></font><div><font color=3D"#333333" face=3D"Arial, sa=
ns-serif"><span style=3D"font-size:14px;line-height:20px">are they CPU inte=
nsive or only memory intensive ?=C2=A0</span></font></div>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Thu,=
 Jul 31, 2014 at 6:28 PM, Sindhu Hosamane <span dir=3D"ltr">&lt;<a href=3D"=
mailto:sindhuht@gmail.com" target=3D"_blank">sindhuht@gmail.com</a>&gt;</sp=
an> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hello ,<br>
<br>
If i am running my experiment on a server with 2 processors (4 cores each )=
 .<br>
To say it has 2 processors and 8 cores .<br>
What would be the ideal values for mapred.tasktracker.map.tasks.maximum =C2=
=A0and mapred.tasktracker.reduce.tasks.maximum to get maximum performance.<=
br>
Your help is very much appreciated.<br>
<br>
<br>
Regards,<br>
Sindhu<br>
<div class=3D"im HOEnZb"><br>
<br>
On 29 Jul 2014, at 18:56, Harsh J &lt;<a href=3D"mailto:harsh@cloudera.com"=
>harsh@cloudera.com</a>&gt; wrote:<br>
<br>
</div><div class=3D"HOEnZb"><div class=3D"h5">&gt; It isn&#39;t the DataNod=
e that does the compute spawn/work, but the TaskTracker.<br>
&gt;<br>
&gt; If you wanted to increase MR parallelism on a single machine, you do<b=
r>
&gt; not need two DNs, nor two TTs, just higher slot capacities in your<br>
&gt; TT&#39;s mapred-site.xml via properties<br>
&gt; mapred.tasktracker.map.tasks.maximum and<br>
&gt; mapred.tasktracker.reduce.tasks.maximum.<br>
&gt;<br>
&gt; On Mon, Jul 28, 2014 at 4:30 PM, sindhu hosamane &lt;<a href=3D"mailto=
:sindhuht@gmail.com">sindhuht@gmail.com</a>&gt; wrote:<br>
&gt;&gt; Hello ,<br>
&gt;&gt;<br>
&gt;&gt; i set up 2 datanodes on a single machine(ubuntu machine) =C2=A0acc=
ordingly<br>
&gt;&gt; mentioned in the thread<br>
&gt;&gt; <a href=3D"http://mail-archives.apache.org/mod_mbox/hadoop-common-=
user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76@mse16be2.mse16=
.exchange.ms%3E" target=3D"_blank">http://mail-archives.apache.org/mod_mbox=
/hadoop-common-user/201009.mbox/%3CA3EF3F6AF24E204B812D1D24CCC8D71A03688F76=
@mse16be2.mse16.exchange.ms%3E</a><br>

&gt;&gt;<br>
&gt;&gt; Ubuntu machine has 2 processors and 8 cores. Assuming that machine=
 is<br>
&gt;&gt; powerful , i Setup 2 datanodes on that same machine.<br>
&gt;&gt;<br>
&gt;&gt; Now when i run jps on that multinode hadoop , i get<br>
&gt;&gt; Namenode<br>
&gt;&gt; Datanode<br>
&gt;&gt; Datanode<br>
&gt;&gt; Jobtracker<br>
&gt;&gt; Tasktracker<br>
&gt;&gt; Secondary Namenode<br>
&gt;&gt;<br>
&gt;&gt; The above result Shows 2 datanodes are up and running<br>
&gt;&gt;<br>
&gt;&gt; Also i have a single node on that ubuntu machine as well.<br>
&gt;&gt; Now when i check Performance on singlenode and multinode , both ar=
e almost<br>
&gt;&gt; same.So now ,<br>
&gt;&gt; How do i make sure load is being distributed on both datanodes or =
each<br>
&gt;&gt; datanode uses different cores of the ubuntu machine.<br>
&gt;&gt;<br>
&gt;&gt; (Note: i know multiple datanodes on same machine is not that advan=
tageous ,<br>
&gt;&gt; but assuming my machine is powerful ..i set it up..)<br>
&gt;&gt;<br>
&gt;&gt; would appreciate any advices on this.<br>
&gt;&gt;<br>
&gt;&gt; Regards,<br>
&gt;&gt; Sindhu<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Harsh J<br>
<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Nitin Pawar<br>
</div>

--20cf3005ddc8edc8ea04ff7d0207--