Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of amits@infolinks.com designates
 207.126.144.119 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr2y2H2tkG7xg7X3Jyt3mUOAnB7T1hCBsDYC=Hy=Jr0qrQ@mail.gmail.com>
References: <CC997DD6-038F-4E47-A85D-4235C92C470E@gmail.com>
	<CAGpJWjtyddG7cpm1cYoZg+GPeDSFqcjr+jOzyVpW+xuH2X2-kw@mail.gmail.com>
	<CAAMYKhqXsLcUosdXcn3m_5kTJsaD07hQ13qzFrD=7C3eoOOMeA@mail.gmail.com>
	<CAOcnVr2y2H2tkG7xg7X3Jyt3mUOAnB7T1hCBsDYC=Hy=Jr0qrQ@mail.gmail.com>
Date: Tue, 27 Nov 2012 12:21:31 +0200
Message-ID: 
 <CAAMYKhpwxqNGrRwVLay5WXfibdz793U=B2vqU1Lpk8G-K58uuw@mail.gmail.com>
Subject: Re: Hadoop 1.0.4 Performance Problem
From: Amit Sela <amits@infolinks.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=f46d0401fb95e8640204cf776993

--f46d0401fb95e8640204cf776993
Content-Type: text/plain; charset=ISO-8859-1

So this is a FairScheduler problem ?
We are using the default Hadoop scheduler. Is there a reason to use the
Fair Scheduler if most of the time we don't have more than 4 jobs
running simultaneously ?

On Tue, Nov 27, 2012 at 12:00 PM, Harsh J <harsh@cloudera.com> wrote:

> Hi Amit,
>
> He means the mapred.fairscheduler.assignmultiple FairScheduler
> property. It is true by default, which works well for most workloads
> if not benchmark style workloads. I would not usually trust that as a
> base perf. measure of everything that comes out of an upgrade.
>
> The other JIRA, MAPREDUCE-4451, has been resolved for 1.2.0.
>
> On Tue, Nov 27, 2012 at 3:20 PM, Amit Sela <amits@infolinks.com> wrote:
> > Hi Jon,
> >
> > I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4
> > and I haven't noticed any performance issues. By  "multiple assignment
> > feature" do you mean speculative execution
> > (mapred.map.tasks.speculative.execution and
> > mapred.reduce.tasks.speculative.execution) ?
> >
> >
> > On Mon, Nov 26, 2012 at 11:49 PM, Jon Allen <jayayedev@gmail.com> wrote:
> >>
> >> Problem solved, but worth warning others about.
> >>
> >> Before the upgrade the reducers for the terasort process had been evenly
> >> distributed around the cluster - one per task tracker in turn, looping
> >> around the cluster until all tasks were allocated.  After the upgrade
> all
> >> reduce task had been submitted to small number of task trackers - submit
> >> tasks until the task tracker slots were full and then move onto the next
> >> task tracker.  Skewing the reducers like this quite clearly hit the
> >> benchmark performance.
> >>
> >> The reason for this turns out to be the fair scheduler rewrite
> >> (MAPREDUCE-2981) that appears to have subtly modified the behaviour of
> the
> >> assign multiple property. Previously this property caused a single map
> and a
> >> single reduce task to be allocated in a task tracker heartbeat (rather
> than
> >> the default of a map or a reduce).  After the upgrade it allocates as
> many
> >> tasks as there are available task slots.  Turning off the multiple
> >> assignment feature returned the terasort to its pre-upgrade performance.
> >>
> >> I can see potential benefits to this change and need to think through
> the
> >> consequences to real world applications (though in practice we're
> likely to
> >> move away from fair scheduler due to MAPREDUCE-4451).  Investigating
> this
> >> has been a pain so to warn other user is there anywhere central that
> can be
> >> used to record upgrade gotchas like this?
> >>
> >>
> >> On Fri, Nov 23, 2012 at 12:02 PM, Jon Allen <jayayedev@gmail.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We've just upgraded our cluster from Hadoop 0.20.203 to 1.0.4 and have
> >>> hit performance problems.  Before the upgrade a 15TB terasort took
> about 45
> >>> minutes, afterwards it takes just over an hour.  Looking in more
> detail it
> >>> appears the shuffle phase has increased from 20 minutes to 40 minutes.
>  Does
> >>> anyone have any thoughts about what's changed between these releases
> that
> >>> may have caused this?
> >>>
> >>> The only change to the system has been to Hadoop.  We moved from a
> >>> tarball install of 0.20.203 with all processes running as hadoop to an
> RPM
> >>> deployment of 1.0.4 with processes running as hdfs and mapred.
>  Nothing else
> >>> has changed.
> >>>
> >>> As a related question, we're still running with a configuration that
> was
> >>> tuned for version 0.20.1. Are there any recommendations for tuning
> >>> properties that have been introduced in recent versions that are worth
> >>> investigating?
> >>>
> >>> Thanks,
> >>> Jon
> >>
> >>
> >
>
>
>
> --
> Harsh J
>

--f46d0401fb95e8640204cf776993
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">So this is a FairScheduler problem ?=A0<div>We are using t=
he default Hadoop scheduler. Is there a reason to use the Fair Scheduler if=
 most of the time we don&#39;t have more than 4 jobs running=A0simultaneous=
ly ?=A0<br>
<br><div class=3D"gmail_quote">On Tue, Nov 27, 2012 at 12:00 PM, Harsh J <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:harsh@cloudera.com" target=3D"_blank"=
>harsh@cloudera.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"=
>
Hi Amit,<br>
<br>
He means the mapred.fairscheduler.assignmultiple FairScheduler<br>
property. It is true by default, which works well for most workloads<br>
if not benchmark style workloads. I would not usually trust that as a<br>
base perf. measure of everything that comes out of an upgrade.<br>
<br>
The other JIRA, MAPREDUCE-4451, has been resolved for 1.2.0.<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Tue, Nov 27, 2012 at 3:20 PM, Amit Sela &lt;<a href=3D"mailto:amits@info=
links.com">amits@infolinks.com</a>&gt; wrote:<br>
&gt; Hi Jon,<br>
&gt;<br>
&gt; I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.=
0.4<br>
&gt; and I haven&#39;t noticed any performance issues. By =A0&quot;multiple=
 assignment<br>
&gt; feature&quot; do you mean speculative execution<br>
&gt; (mapred.map.tasks.speculative.execution and<br>
&gt; mapred.reduce.tasks.speculative.execution) ?<br>
&gt;<br>
&gt;<br>
&gt; On Mon, Nov 26, 2012 at 11:49 PM, Jon Allen &lt;<a href=3D"mailto:jaya=
yedev@gmail.com">jayayedev@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Problem solved, but worth warning others about.<br>
&gt;&gt;<br>
&gt;&gt; Before the upgrade the reducers for the terasort process had been =
evenly<br>
&gt;&gt; distributed around the cluster - one per task tracker in turn, loo=
ping<br>
&gt;&gt; around the cluster until all tasks were allocated. =A0After the up=
grade all<br>
&gt;&gt; reduce task had been submitted to small number of task trackers - =
submit<br>
&gt;&gt; tasks until the task tracker slots were full and then move onto th=
e next<br>
&gt;&gt; task tracker. =A0Skewing the reducers like this quite clearly hit =
the<br>
&gt;&gt; benchmark performance.<br>
&gt;&gt;<br>
&gt;&gt; The reason for this turns out to be the fair scheduler rewrite<br>
&gt;&gt; (MAPREDUCE-2981) that appears to have subtly modified the behaviou=
r of the<br>
&gt;&gt; assign multiple property. Previously this property caused a single=
 map and a<br>
&gt;&gt; single reduce task to be allocated in a task tracker heartbeat (ra=
ther than<br>
&gt;&gt; the default of a map or a reduce). =A0After the upgrade it allocat=
es as many<br>
&gt;&gt; tasks as there are available task slots. =A0Turning off the multip=
le<br>
&gt;&gt; assignment feature returned the terasort to its pre-upgrade perfor=
mance.<br>
&gt;&gt;<br>
&gt;&gt; I can see potential benefits to this change and need to think thro=
ugh the<br>
&gt;&gt; consequences to real world applications (though in practice we&#39=
;re likely to<br>
&gt;&gt; move away from fair scheduler due to MAPREDUCE-4451). =A0Investiga=
ting this<br>
&gt;&gt; has been a pain so to warn other user is there anywhere central th=
at can be<br>
&gt;&gt; used to record upgrade gotchas like this?<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Fri, Nov 23, 2012 at 12:02 PM, Jon Allen &lt;<a href=3D"mailto:=
jayayedev@gmail.com">jayayedev@gmail.com</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Hi,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; We&#39;ve just upgraded our cluster from Hadoop 0.20.203 to 1.=
0.4 and have<br>
&gt;&gt;&gt; hit performance problems. =A0Before the upgrade a 15TB terasor=
t took about 45<br>
&gt;&gt;&gt; minutes, afterwards it takes just over an hour. =A0Looking in =
more detail it<br>
&gt;&gt;&gt; appears the shuffle phase has increased from 20 minutes to 40 =
minutes. =A0Does<br>
&gt;&gt;&gt; anyone have any thoughts about what&#39;s changed between thes=
e releases that<br>
&gt;&gt;&gt; may have caused this?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; The only change to the system has been to Hadoop. =A0We moved =
from a<br>
&gt;&gt;&gt; tarball install of 0.20.203 with all processes running as hado=
op to an RPM<br>
&gt;&gt;&gt; deployment of 1.0.4 with processes running as hdfs and mapred.=
 =A0Nothing else<br>
&gt;&gt;&gt; has changed.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; As a related question, we&#39;re still running with a configur=
ation that was<br>
&gt;&gt;&gt; tuned for version 0.20.1. Are there any recommendations for tu=
ning<br>
&gt;&gt;&gt; properties that have been introduced in recent versions that a=
re worth<br>
&gt;&gt;&gt; investigating?<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Thanks,<br>
&gt;&gt;&gt; Jon<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
<br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br></div></div>

--f46d0401fb95e8640204cf776993--