Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of samliuhadoop@gmail.com
 designates 209.85.215.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAJs-K1t93LvFtPqiqkyJWwPbpL3X_gcMb3XcHtBituZROHVVCQ@mail.gmail.com>
References: 
 <CAHH8OOfVvdz29ebiSmBPcdVz4is=mgmghd8QHp=KTkob1eGTDA@mail.gmail.com>
	<CAJs-K1t93LvFtPqiqkyJWwPbpL3X_gcMb3XcHtBituZROHVVCQ@mail.gmail.com>
Date: Fri, 7 Jun 2013 11:15:43 +0800
Message-ID: 
 <CAHH8OOfQUG5cgfO8MJAbEAvaw_M2O2esinnHAOKaU8mSbB8x9A@mail.gmail.com>
Subject: Re: Why my tests shows Yarn is worse than MRv1 for terasort?
From: sam liu <samliuhadoop@gmail.com>
To: Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com>
Cc: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=089e01634652a3ba8f04de87d873

--089e01634652a3ba8f04de87d873
Content-Type: text/plain; charset=ISO-8859-1

At the begining, I just want to do a fast comparision of MRv1 and Yarn. But
they have many differences, and to be fair for comparison I did not tune
their configurations at all.  So I got above test results. After analyzing
the test result, no doubt, I will configure them and do comparison again.

Do you have any idea on current test result? I think, to compare with MRv1,
Yarn is better on Map phase(teragen test), but worse on Reduce
phase(terasort test).
And any detailed suggestions/comments/materials on Yarn performance tunning?

Thanks!


2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com>

> Why not to tune the configurations?
> Both frameworks have many areas to tune:
> - Combiners, Shuffle optimization, Block size, etc
>
>
>
> 2013/6/6 sam liu <samliuhadoop@gmail.com>
>
>> Hi Experts,
>>
>> We are thinking about whether to use Yarn or not in the near future, and
>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>
>> My env is three nodes cluster, and each node has similar hardware: 2
>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same env. To
>> be fair, I did not make any performance tuning on their configurations, but
>> use the default configuration values.
>>
>> Before testing, I think Yarn will be much better than MRv1, if they all
>> use default configuration, because Yarn is a better framework than MRv1.
>> However, the test result shows some differences:
>>
>> MRv1: Hadoop-1.1.1
>> Yarn: Hadoop-2.0.4
>>
>> (A) Teragen: generate 10 GB data:
>> - MRv1: 193 sec
>> - Yarn: 69 sec
>> *Yarn is 2.8 times better than MRv1*
>>
>> (B) Terasort: sort 10 GB data:
>> - MRv1: 451 sec
>> - Yarn: 1136 sec
>> *Yarn is 2.5 times worse than MRv1*
>>
>> After a fast analysis, I think the direct cause might be that Yarn is
>> much faster than MRv1 on Map phase, but much worse on Reduce phase.
>>
>> Here I have two questions:
>> *- Why my tests shows Yarn is worse than MRv1 for terasort?
>> *
>> *- What's the stratage for tuning Yarn performance? Is any materials?*
>>
>> Thanks!
>>
>
>
>
> --
> Marcos Ortiz Valmaseda
> Product Manager at PDVSA
> http://about.me/marcosortiz
>
>

--089e01634652a3ba8f04de87d873
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div>At the begining, I just want to do a fast c=
omparision of MRv1 and Yarn. But they have many differences, and to be fair=
 for comparison I did not tune their configurations at all.=A0 So I got abo=
ve test results. After analyzing the test result, no doubt, I will configur=
e them and do comparison again. <br>
<br></div>Do you have any idea on current test result? I think, to compare =
with MRv1, Yarn is better on Map phase(teragen test), but worse on Reduce p=
hase(terasort test).<br></div>And any detailed suggestions/comments/materia=
ls on Yarn performance tunning?<br>
<br></div>Thanks!<br></div><div class=3D"gmail_extra"><br><br><div class=3D=
"gmail_quote">2013/6/7 Marcos Luis Ortiz Valmaseda <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:marcosluis2186@gmail.com" target=3D"_blank">marcosluis2186@=
gmail.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Why not to tune t=
he configurations?<br></div>Both frameworks have many areas to tune:<br></d=
iv>
- Combiners, Shuffle optimization, Block size, etc<br><br></div><div class=
=3D"gmail_extra"><div><div class=3D"h5"><br><br><div class=3D"gmail_quote">

2013/6/6 sam liu <span dir=3D"ltr">&lt;<a href=3D"mailto:samliuhadoop@gmail=
.com" target=3D"_blank">samliuhadoop@gmail.com</a>&gt;</span><br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">


<div dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div>Hi Exper=
ts,<br><br></div>We are thinking about whether to use Yarn or not in the ne=
ar future, and I ran teragen/terasort on Yarn and MRv1 for comprison. <br>


<br></div>My env is three nodes cluster, and each node has similar=20
hardware: 2 cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on=20
the same env. To be fair, I did not make any performance tuning on their
 configurations, but use the default configuration values.<br>
<br>Before testing, I think Yarn will be much better than MRv1, if they=20
all use default configuration, because Yarn is a better framework than=20
MRv1. However, the test result shows some differences:<br><br></div><div>MR=
v1: Hadoop-1.1.1<br>
</div><div>Yarn: Hadoop-2.0.4<br></div><div><br></div>(A) Teragen: generate=
 10 GB data:<br></div>- MRv1: 193 sec<br></div>- Yarn: 69 sec<br></div><b>Y=
arn is 2.8 times better than MRv1</b><br><br></div>(B) Terasort: sort 10 GB=
 data:<br>


</div>- MRv1: 451 sec<br></div>- Yarn: 1136 sec<br></div><b>Yarn is 2.5 tim=
es worse than MRv1</b><br clear=3D"all"><div><br></div><div>After
 a fast analysis, I think the direct cause might be that Yarn is much=20
faster than MRv1 on Map phase, but much worse on Reduce phase.<br>
<br></div><div>Here I have two questions:<br></div><div><b>- Why my tests s=
hows Yarn is worse than MRv1 for terasort?<br></b></div><div><b>- What&#39;=
s the stratage for tuning Yarn performance? Is any materials?</b><br></div>


<div><br></div>Thanks!</div>
</blockquote></div><br><br clear=3D"all"><br></div></div><span class=3D"HOE=
nZb"><font color=3D"#888888">-- <br><div dir=3D"ltr"><span style=3D"font-fa=
mily:verdana,sans-serif"><font color=3D"#888888">Marcos Ortiz Valmaseda<br>=
Product Manager at PDVSA<br>
<a href=3D"http://about.me/marcosortiz" target=3D"_blank">http://about.me/m=
arcosortiz</a><br>

<br></font></span></div>
</font></span></div>
</blockquote></div><br></div>

--089e01634652a3ba8f04de87d873--