Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1A12010803 for ; Fri, 7 Jun 2013 05:22:39 +0000 (UTC) Received: (qmail 46458 invoked by uid 500); 7 Jun 2013 05:22:30 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 45708 invoked by uid 500); 7 Jun 2013 05:22:19 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 45689 invoked by uid 99); 7 Jun 2013 05:22:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 05:22:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of samliuhadoop@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ea0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Jun 2013 05:22:12 +0000 Received: by mail-ea0-f172.google.com with SMTP id q10so1817902eaj.3 for ; Thu, 06 Jun 2013 22:21:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=kD/HszdsYCrMlUWYfAQ2Loi0RxWEL6TkHBfEBtSESB0=; b=diohDvmWv6eSNrLiWz2yQ+zLpM3n0xMusNzFzuHwj0ToWTxBZSQCg/86gmP91Xhgzr aX0PwoIfCUV3nlIBYXorLuWVgXye1aDPnAlebrUpQ2m1ObZC+7mSDAC6U8uBJobTgpdl 1ouiOM1yC1RzUPhAlwiSR3k8gRuuOZmTtoUvXBqi03Y3NpTkZE5L/vRGouE422PdxrgX +UpLlbaDhpDwZAZX2asUkd1BT5Gckti5oJZPib+gCcQeYyLaePgQz5MDh7e9ECFP5Fo4 ayebu3AItlGVFRI9iUY7vDq3STp3Q3bBpeFXmOSBDzzCLn84R7Sn5b089whp6MuSKP80 JuDA== MIME-Version: 1.0 X-Received: by 10.14.0.131 with SMTP id 3mr14181429eeb.98.1370582509322; Thu, 06 Jun 2013 22:21:49 -0700 (PDT) Received: by 10.14.214.2 with HTTP; Thu, 6 Jun 2013 22:21:49 -0700 (PDT) In-Reply-To: References: Date: Fri, 7 Jun 2013 13:21:49 +0800 Message-ID: Subject: Re: Why my tests shows Yarn is worse than MRv1 for terasort? From: sam liu To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b6042be9aa50f04de899b29 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b6042be9aa50f04de899b29 Content-Type: text/plain; charset=ISO-8859-1 The terasort execution log shows that reduce spent about 5.5 mins from 33% to 35% as below. 13/06/10 08:02:22 INFO mapreduce.Job: map 100% reduce 31% 13/06/10 08:02:25 INFO mapreduce.Job: map 100% reduce 32% 13/06/10 *08:02:46* INFO mapreduce.Job: map 100% reduce 33% 13/06/10 *08:08:16* INFO mapreduce.Job: map 100% reduce 35% 13/06/10 08:08:19 INFO mapreduce.Job: map 100% reduce 40% 13/06/10 08:08:22 INFO mapreduce.Job: map 100% reduce 43% Any way, below are my configurations for your reference. Thanks! *(A) core-site.xml* only define 'fs.default.name' and 'hadoop.tmp.dir' *(B) hdfs-site.xml* dfs.replication 1 dfs.name.dir /opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir dfs.data.dir /opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir dfs.block.size 134217728 dfs.namenode.handler.count 64 dfs.datanode.handler.count 10 *(C) mapred-site.xml* mapreduce.cluster.temp.dir /opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp No description true mapreduce.cluster.local.dir /opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir No description true mapreduce.child.java.opts -Xmx1000m mapreduce.framework.name yarn mapreduce.tasktracker.map.tasks.maximum 8 mapreduce.tasktracker.reduce.tasks.maximum 4 mapreduce.tasktracker.outofband.heartbeat true *(D) yarn-site.xml* yarn.resourcemanager.resource-tracker.address node1:18025 host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. The address of the RM web application. yarn.resourcemanager.webapp.address node1:18088 yarn.resourcemanager.scheduler.address node1:18030 host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. yarn.resourcemanager.address node1:18040 the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. yarn.nodemanager.local-dirs /opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir the local directories used by the nodemanager yarn.nodemanager.address 0.0.0.0:18050 the nodemanagers bind to this port yarn.nodemanager.resource.memory-mb 10240 the amount of memory on the NodeManager in GB yarn.nodemanager.remote-app-log-dir /opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs directory on hdfs where the application logs are moved to yarn.nodemanager.log-dirs /opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log the directories used by Nodemanagers as log directories yarn.nodemanager.aux-services mapreduce.shuffle shuffle service that needs to be set for Map Reduce to run yarn.resourcemanager.client.thread-count 64 yarn.nodemanager.resource.cpu-cores 24 yarn.nodemanager.vcores-pcores-ratio 3 yarn.nodemanager.resource.memory-mb 22000 yarn.nodemanager.vmem-pmem-ratio 2.1 2013/6/7 Harsh J > Not tuning configurations at all is wrong. YARN uses memory resource > based scheduling and hence MR2 would be requesting 1 GB minimum by > default, causing, on base configs, to max out at 8 (due to 8 GB NM > memory resource config) total containers. Do share your configs as at > this point none of us can tell what it is. > > Obviously, it isn't our goal to make MR2 slower for users and to not > care about such things :) > > On Fri, Jun 7, 2013 at 8:45 AM, sam liu wrote: > > At the begining, I just want to do a fast comparision of MRv1 and Yarn. > But > > they have many differences, and to be fair for comparison I did not tune > > their configurations at all. So I got above test results. After > analyzing > > the test result, no doubt, I will configure them and do comparison again. > > > > Do you have any idea on current test result? I think, to compare with > MRv1, > > Yarn is better on Map phase(teragen test), but worse on Reduce > > phase(terasort test). > > And any detailed suggestions/comments/materials on Yarn performance > tunning? > > > > Thanks! > > > > > > 2013/6/7 Marcos Luis Ortiz Valmaseda > >> > >> Why not to tune the configurations? > >> Both frameworks have many areas to tune: > >> - Combiners, Shuffle optimization, Block size, etc > >> > >> > >> > >> 2013/6/6 sam liu > >>> > >>> Hi Experts, > >>> > >>> We are thinking about whether to use Yarn or not in the near future, > and > >>> I ran teragen/terasort on Yarn and MRv1 for comprison. > >>> > >>> My env is three nodes cluster, and each node has similar hardware: 2 > >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the same > env. To > >>> be fair, I did not make any performance tuning on their > configurations, but > >>> use the default configuration values. > >>> > >>> Before testing, I think Yarn will be much better than MRv1, if they all > >>> use default configuration, because Yarn is a better framework than > MRv1. > >>> However, the test result shows some differences: > >>> > >>> MRv1: Hadoop-1.1.1 > >>> Yarn: Hadoop-2.0.4 > >>> > >>> (A) Teragen: generate 10 GB data: > >>> - MRv1: 193 sec > >>> - Yarn: 69 sec > >>> Yarn is 2.8 times better than MRv1 > >>> > >>> (B) Terasort: sort 10 GB data: > >>> - MRv1: 451 sec > >>> - Yarn: 1136 sec > >>> Yarn is 2.5 times worse than MRv1 > >>> > >>> After a fast analysis, I think the direct cause might be that Yarn is > >>> much faster than MRv1 on Map phase, but much worse on Reduce phase. > >>> > >>> Here I have two questions: > >>> - Why my tests shows Yarn is worse than MRv1 for terasort? > >>> - What's the stratage for tuning Yarn performance? Is any materials? > >>> > >>> Thanks! > >> > >> > >> > >> > >> -- > >> Marcos Ortiz Valmaseda > >> Product Manager at PDVSA > >> http://about.me/marcosortiz > >> > > > > > > -- > Harsh J > --047d7b6042be9aa50f04de899b29 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The terasort execution log shows that reduce spent ab= out 5.5 mins from 33% to 35% as below.
13/06/10 08:02:22 INFO mapreduce= .Job:=A0 map 100% reduce 31%
13/06/10 08:02:25 INFO mapreduce.Job:=A0 ma= p 100% reduce 32%
13/06/10 08:02:46 INFO mapreduce.Job:=A0 map 100% reduce 33%
13/0= 6/10 08:08:16 INFO mapreduce.Job:=A0 map 100% reduce 35%
13/06/10= 08:08:19 INFO mapreduce.Job:=A0 map 100% reduce 40%
13/06/10 08:08:22 I= NFO mapreduce.Job:=A0 map 100% reduce 43%

Any way, below are my configurations for your reference. Tha= nks!
(A) core-site.xml
only define 'fs.default.name' and 'hadoop.tmp.dir= 9;

(B) hdfs-site.xml
=A0 <property>
=A0=A0= =A0 <name>dfs.replication</name>
=A0=A0=A0 <value>1<= ;/value>
=A0 </property>

=A0 <property>
=A0=A0= =A0 <name>dfs.name.dir</name>
=A0=A0=A0 <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir<= /value>
=A0 </property>

=A0 <property>
=A0=A0= =A0 <name>dfs.data.dir</name>
=A0=A0=A0 <value>/opt/ha= doop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
=A0 </property>

=A0 <property>
=A0=A0=A0 <name>= dfs.block.size</name>
=A0=A0=A0 <value>134217728</value&g= t;<!-- 128MB -->
=A0 </property>

=A0 <property>=
=A0=A0=A0 <name>dfs.namenode.handler.count</name>
=A0=A0=A0 <value>64</value>
=A0 </property>

=A0= <property>
=A0=A0=A0 <name>dfs.datanode.handler.count</n= ame>
=A0=A0=A0 <value>10</value>
=A0 </property>=

(C) mapred-site.xml
=A0 <property>
=A0=A0=A0 <name>mapreduce.cluster.temp.dir<= ;/name>
=A0=A0=A0 <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/ma= preduce_temp</value>
=A0=A0=A0 <description>No description&l= t;/description>
=A0=A0=A0 <final>true</final>
=A0 </property>

= =A0 <property>
=A0=A0=A0 <name>mapreduce.cluster.local.dir&l= t;/name>
=A0=A0=A0 <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/m= apreduce_local_dir</value>
=A0=A0=A0 <description>No description</description>
=A0=A0= =A0 <final>true</final>
=A0 </property>

<pro= perty>
=A0 <name>mapreduce.child.java.opts</name>
=A0 = <value>-Xmx1000m</value>
</property>

<property>
=A0=A0=A0 <name>mapreduce.framework.name</name&= gt;
=A0=A0=A0 <value>yarn</value>
=A0=A0 </property>= ;

=A0<property>
=A0=A0=A0 <name>mapreduce.tasktracker.map.tasks.maximum</name><= br>=A0=A0=A0 <value>8</value>
=A0 </property>

= =A0 <property>
=A0=A0=A0 <name>mapreduce.tasktracker.reduce.= tasks.maximum</name>
=A0=A0=A0 <value>4</value>
=A0 </property>


= =A0 <property>
=A0=A0=A0 <name>mapreduce.tasktracker.outofba= nd.heartbeat</name>
=A0=A0=A0 <value>true</value>
= =A0 </property>

(D) yarn-site.xml
=A0<property>=A0=A0=A0 <name>yarn.resourcemanager.resource-tracker.address</n= ame>
=A0=A0=A0 <value>node1:18025</value>
=A0=A0=A0 &l= t;description>host is the hostname of the resource manager and
=A0=A0=A0 port is the port on which the NodeManagers contact the Resource M= anager.
=A0=A0=A0 </description>
=A0 </property>

= =A0 <property>
=A0=A0=A0 <description>The address of the RM = web application.</description>
=A0=A0=A0 <name>yarn.resourcemanager.webapp.address</name>
= =A0=A0=A0 <value>node1:18088</value>
=A0 </property>

=A0 <property>
=A0=A0=A0 <name>yarn.resourcemanage= r.scheduler.address</name>
=A0=A0=A0 <value>node1:18030</value>
=A0=A0=A0 <descripti= on>host is the hostname of the resourcemanager and port is the port
= =A0=A0=A0 on which the Applications in the cluster talk to the Resource Man= ager.
=A0=A0=A0 </description>
=A0 </property>


=A0 <property>
=A0=A0=A0 <name= >yarn.resourcemanager.address</name>
=A0=A0=A0 <value>nod= e1:18040</value>
=A0=A0=A0 <description>the host is the host= name of the ResourceManager and the port is the port on
=A0=A0=A0 which the clients can talk to the Resource Manager. </descript= ion>
=A0 </property>

=A0 <property>
=A0=A0=A0 &= lt;name>yarn.nodemanager.local-dirs</name>
=A0=A0=A0 <value&= gt;/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
=A0=A0=A0 <description>the local directories used by the nodemanager&= lt;/description>
=A0 </property>

=A0 <property>=A0=A0=A0 <name>yarn.nodemanager.address</name>
=A0=A0=A0 &= lt;value>0.0.0.0:18050</value>= ;
=A0=A0=A0 <description>the nodemanagers bind to this port</descrip= tion>
=A0 </property>

=A0 <property>
=A0=A0=A0 = <name>yarn.nodemanager.resource.memory-mb</name>
=A0=A0=A0 &= lt;value>10240</value>
=A0=A0=A0 <description>the amount of memory on the NodeManager in GB&= lt;/description>
=A0 </property>

=A0 <property>=A0=A0=A0 <name>yarn.nodemanager.remote-app-log-dir</name>
= =A0=A0=A0 <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs= </value>
=A0=A0=A0 <description>directory on hdfs where the application logs a= re moved to </description>
=A0 </property>

=A0=A0 <= ;property>
=A0=A0=A0 <name>yarn.nodemanager.log-dirs</name&g= t;
=A0=A0=A0 <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_lo= g</value>
=A0=A0=A0 <description>the directories used by Nodemanagers as log di= rectories</description>
=A0 </property>

=A0 <prope= rty>
=A0=A0=A0 <name>yarn.nodemanager.aux-services</name>=
=A0=A0=A0 <value>mapreduce.shuffle</value>
=A0=A0=A0 <description>shuffle service that needs to be set for Map R= educe to run </description>
=A0 </property>

=A0 <p= roperty>
=A0=A0=A0 <name>yarn.resourcemanager.client.thread-cou= nt</name>
=A0=A0=A0 <value>64</value>
=A0 </property>

=A0= <property>
=A0=A0=A0 <name>yarn.nodemanager.resource.cpu-cor= es</name>
=A0=A0=A0 <value>24</value>
=A0 </prop= erty>

<property>
=A0=A0=A0 <name>yarn.nodemanager.vcores-pcores-ratio</name>
= =A0=A0=A0 <value>3</value>
=A0 </property>

=A0&= lt;property>
=A0=A0=A0 <name>yarn.nodemanager.resource.memory-m= b</name>
=A0=A0=A0 <value>22000</value>
=A0 </property>

= =A0<property>
=A0=A0=A0 <name>yarn.nodemanager.vmem-pmem-rat= io</name>
=A0=A0=A0 <value>2.1</value>
=A0 </pro= perty>



2013/6/7 Hars= h J <harsh@cloudera.com>
Not tuning configurations at all is wrong. YARN uses memory resource
based scheduling and hence MR2 would be requesting 1 GB minimum by
default, causing, on base configs, to max out at 8 (due to 8 GB NM
memory resource config) total containers. Do share your configs as at
this point none of us can tell what it is.

Obviously, it isn't our goal to make MR2 slower for users and to not care about such things :)

On Fri, Jun 7, 2013 at 8:45 AM, sam liu <samliuhadoop@gmail.com> wrote:
> At the begining, I just want to do a fast comparision of MRv1 and Yarn= . But
> they have many differences, and to be fair for comparison I did not tu= ne
> their configurations at all. =A0So I got above test results. After ana= lyzing
> the test result, no doubt, I will configure them and do comparison aga= in.
>
> Do you have any idea on current test result? I think, to compare with = MRv1,
> Yarn is better on Map phase(teragen test), but worse on Reduce
> phase(terasort test).
> And any detailed suggestions/comments/materials on Yarn performance tu= nning?
>
> Thanks!
>
>
> 2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2186@gmail.com>
>>
>> Why not to tune the configurations?
>> Both frameworks have many areas to tune:
>> - Combiners, Shuffle optimization, Block size, etc
>>
>>
>>
>> 2013/6/6 sam liu <sam= liuhadoop@gmail.com>
>>>
>>> Hi Experts,
>>>
>>> We are thinking about whether to use Yarn or not in the near f= uture, and
>>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>>>
>>> My env is three nodes cluster, and each node has similar hardw= are: 2
>>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the= same env. To
>>> be fair, I did not make any performance tuning on their config= urations, but
>>> use the default configuration values.
>>>
>>> Before testing, I think Yarn will be much better than MRv1, if= they all
>>> use default configuration, because Yarn is a better framework = than MRv1.
>>> However, the test result shows some differences:
>>>
>>> MRv1: Hadoop-1.1.1
>>> Yarn: Hadoop-2.0.4
>>>
>>> (A) Teragen: generate 10 GB data:
>>> - MRv1: 193 sec
>>> - Yarn: 69 sec
>>> Yarn is 2.8 times better than MRv1
>>>
>>> (B) Terasort: sort 10 GB data:
>>> - MRv1: 451 sec
>>> - Yarn: 1136 sec
>>> Yarn is 2.5 times worse than MRv1
>>>
>>> After a fast analysis, I think the direct cause might be that = Yarn is
>>> much faster than MRv1 on Map phase, but much worse on Reduce p= hase.
>>>
>>> Here I have two questions:
>>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>>> - What's the stratage for tuning Yarn performance? Is any = materials?
>>>
>>> Thanks!
>>
>>
>>
>>
>> --
>> Marcos Ortiz Valmaseda
>> Product Manager at PDVSA
>> http://a= bout.me/marcosortiz
>>
>



--
Harsh J

--047d7b6042be9aa50f04de899b29--