hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jian Fang <jian.fang.subscr...@gmail.com>
Subject Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Date Wed, 23 Oct 2013 07:23:49 GMT
Unfortunately, turning off JVM reuse still got the same result, i.e., about
90 minutes for MR2. I don't think the killed reduces could contribute to 2
times slowness. There should be something very wrong either in
configuration or code. Any hints?



On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <jian.fang.subscribe@gmail.com>wrote:

> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>
> Yes, I saw quite some exceptions in the task attempts. For instance.
>
>
> 2013-10-20 03:13:58,751 ERROR [main]
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
> 2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
> Failed to close file
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
> File does not exist. Holder
> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
> any open files.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> --
>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>         at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>         at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>         at
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.nio.channels.ClosedChannelException
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>         at
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>         at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>         at
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>         at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>         at
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>         at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>         at
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>
>
>
> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:
>
>> It looks like many of your reduce tasks were killed.  Do you know why?
>>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
>> MR1 with JVM reuse turned off.
>>
>> -Sandy
>>
>>
>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> The Terasort output for MR2 is as follows.
>>>
>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>>> Counters: 46
>>>         File System Counters
>>>                 FILE: Number of bytes read=456102049355
>>>                 FILE: Number of bytes written=897246250517
>>>                 FILE: Number of read operations=0
>>>                 FILE: Number of large read operations=0
>>>                 FILE: Number of write operations=0
>>>                 HDFS: Number of bytes read=1000000851200
>>>                 HDFS: Number of bytes written=1000000000000
>>>                 HDFS: Number of read operations=32131
>>>                 HDFS: Number of large read operations=0
>>>                 HDFS: Number of write operations=224
>>>         Job Counters
>>>                 Killed map tasks=1
>>>                 Killed reduce tasks=20
>>>                 Launched map tasks=7601
>>>                 Launched reduce tasks=132
>>>                 Data-local map tasks=7591
>>>                 Rack-local map tasks=10
>>>                 Total time spent by all maps in occupied slots
>>> (ms)=1696141311
>>>                 Total time spent by all reduces in occupied slots
>>> (ms)=2664045096
>>>         Map-Reduce Framework
>>>                 Map input records=10000000000
>>>                 Map output records=10000000000
>>>                 Map output bytes=1020000000000
>>>                 Map output materialized bytes=440486356802
>>>                 Input split bytes=851200
>>>                 Combine input records=0
>>>                 Combine output records=0
>>>                 Reduce input groups=10000000000
>>>                 Reduce shuffle bytes=440486356802
>>>                 Reduce input records=10000000000
>>>                 Reduce output records=10000000000
>>>                 Spilled Records=20000000000
>>>                 Shuffled Maps =851200
>>>                 Failed Shuffles=61
>>>                 Merged Map outputs=851200
>>>                 GC time elapsed (ms)=4215666
>>>                 CPU time spent (ms)=192433000
>>>                 Physical memory (bytes) snapshot=3349356380160
>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>                 Total committed heap usage (bytes)=3636854259712
>>>         Shuffle Errors
>>>                 BAD_ID=0
>>>                 CONNECTION=0
>>>                 IO_ERROR=4
>>>                 WRONG_LENGTH=0
>>>                 WRONG_MAP=0
>>>                 WRONG_REDUCE=0
>>>         File Input Format Counters
>>>                 Bytes Read=1000000000000
>>>         File Output Format Counters
>>>                 Bytes Written=1000000000000
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>
>>>
>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>>> I cannot really believe it.
>>>>
>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>>> configurations are as follows.
>>>>
>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>         mapreduce.map.memory.mb = "768";
>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>
>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>
>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>         mapreduce.task.io.sort.factor = "48";
>>>>         mapreduce.task.io.sort.mb = "200";
>>>>         mapreduce.map.speculative = "true";
>>>>         mapreduce.reduce.speculative = "true";
>>>>         mapreduce.framework.name = "yarn";
>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>>         mapreduce.map.cpu.vcores = "1";
>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>
>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>         mapreduce.map.output.compress = "true";
>>>>         mapreduce.map.output.compress.codec =
>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>
>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>         yarn.resourcemanager.resource-tracker.client.thread-count =
>>>> "64";
>>>>         yarn.resourcemanager.scheduler.class =
>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>         yarn.nodemanager.container-executor.class =
>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>
>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>>> it took around 90 minutes in MR2.
>>>>
>>>> Does anyone know what is wrong here? Or do I need some special
>>>> configurations for terasort to work better in MR2?
>>>>
>>>> Thanks in advance,
>>>>
>>>> John
>>>>
>>>>
>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>> michael_segel@hotmail.com> wrote:
>>>>
>>>>> Sam,
>>>>> I think your cluster is too small for any meaningful conclusions to be
>>>>> made.
>>>>>
>>>>>
>>>>> Sent from a remote device. Please excuse any typos...
>>>>>
>>>>> Mike Segel
>>>>>
>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <samliuhadoop@gmail.com> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>>
>>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>>> cluster improved a lot after increasing the reducer
>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>
>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>>> together, with a typical process(map > shuffle > reduce). In Yarn,
as I
>>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>>> has special execution process(map > shuffle > reduce) in Hadoop
1.x, and
>>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>> - What's the general process for ApplicationMaster of Yarn to execute
>>>>> a job?
>>>>>
>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>>> container instead, right?
>>>>> - For Yarn, we can set the whole physical mem for a NodeManager using
'
>>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>>> of physical mem of a container?
>>>>> - How to set the maximum size of physical mem of a container? By the
>>>>> parameter of 'mapred.child.java.opts'?
>>>>>
>>>>> Thanks as always!
>>>>>
>>>>> 2013/6/9 Harsh J <harsh@cloudera.com>
>>>>>
>>>>>> Hi Sam,
>>>>>>
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due to a 22 GB memory?
>>>>>>
>>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>> Sandy's mentioned in his post.
>>>>>>
>>>>>> > - My machine has 32 GB memory, how many memory is proper to
be
>>>>>> assigned to containers?
>>>>>>
>>>>>> This is a general question. You may use the same process you took
to
>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>>> lower # of concurrent containers at a given time (runtime change),
or
>>>>>> lower NM's published memory resources to control the same (config
>>>>>> change).
>>>>>>
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
be
>>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>
>>>>>> Yes, all of these properties will still work. Old properties specific
>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>>> name) will not apply anymore.
>>>>>>
>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <samliuhadoop@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Harsh,
>>>>>> >
>>>>>> > According to above suggestions, I removed the duplication of
>>>>>> setting, and
>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
then,
>>>>>> the
>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>> >
>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>> containers due
>>>>>> > to a 22 GB memory?
>>>>>> > - My machine has 32 GB memory, how many memory is proper to
be
>>>>>> assigned to
>>>>>> > containers?
>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
be
>>>>>> 'yarn', will
>>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>>> Like
>>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>>> >
>>>>>> > Thanks!
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > 2013/6/8 Harsh J <harsh@cloudera.com>
>>>>>> >>
>>>>>> >> Hey Sam,
>>>>>> >>
>>>>>> >> Did you get a chance to retry with Sandy's suggestions?
The config
>>>>>> >> appears to be asking NMs to use roughly 22 total containers
(as
>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
memory
>>>>>> >> resource. This could impact much, given the CPU is still
the same
>>>>>> for
>>>>>> >> both test runs.
>>>>>> >>
>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>> sandy.ryza@cloudera.com>
>>>>>> >> wrote:
>>>>>> >> > Hey Sam,
>>>>>> >> >
>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
about
>>>>>> what's
>>>>>> >> > causing the difference.
>>>>>> >> >
>>>>>> >> > A couple observations:
>>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
in
>>>>>> there
>>>>>> >> > twice
>>>>>> >> > with two different values.
>>>>>> >> >
>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
to the
>>>>>> default
>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any
of your tasks
>>>>>> getting
>>>>>> >> > killed for running over resource limits?
>>>>>> >> >
>>>>>> >> > -Sandy
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <samliuhadoop@gmail.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> The terasort execution log shows that reduce spent
about 5.5
>>>>>> mins from
>>>>>> >> >> 33%
>>>>>> >> >> to 35% as below.
>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100%
reduce 31%
>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100%
reduce 32%
>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100%
reduce 33%
>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100%
reduce 35%
>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100%
reduce 40%
>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100%
reduce 43%
>>>>>> >> >>
>>>>>> >> >> Any way, below are my configurations for your reference.
Thanks!
>>>>>> >> >> (A) core-site.xml
>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>> >> >>
>>>>>> >> >> (B) hdfs-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.replication</name>
>>>>>> >> >>     <value>1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>> >> >>     <value>134217728</value><!--
128MB -->
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>> >> >>     <value>10</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (C) mapred-site.xml
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>> >> >>     <description>No description</description>
>>>>>> >> >>     <final>true</final>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>> >> >> </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>> >> >>     <value>yarn</value>
>>>>>> >> >>    </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>> >> >>     <value>8</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>> >> >>     <value>4</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>> >> >>     <value>true</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> (D) yarn-site.xml
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>> >> >>     <value>node1:18025</value>
>>>>>> >> >>     <description>host is the hostname of
the resource manager
>>>>>> and
>>>>>> >> >>     port is the port on which the NodeManagers
contact the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <description>The address of the RM web
>>>>>> application.</description>
>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>> >> >>     <value>node1:18088</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>> >> >>     <value>node1:18030</value>
>>>>>> >> >>     <description>host is the hostname of
the resourcemanager
>>>>>> and port
>>>>>> >> >> is
>>>>>> >> >> the port
>>>>>> >> >>     on which the Applications in the cluster talk
to the
>>>>>> Resource
>>>>>> >> >> Manager.
>>>>>> >> >>     </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>> >> >>     <value>node1:18040</value>
>>>>>> >> >>     <description>the host is the hostname
of the
>>>>>> ResourceManager and
>>>>>> >> >> the
>>>>>> >> >> port is the port on
>>>>>> >> >>     which the clients can talk to the Resource
Manager.
>>>>>> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>> >> >>     <description>the local directories used
by the
>>>>>> >> >> nodemanager</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>> >> >>     <description>the nodemanagers bind to
this
>>>>>> port</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>10240</value>
>>>>>> >> >>     <description>the amount of memory on
the NodeManager in
>>>>>> >> >> GB</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>> >> >>     <description>directory on hdfs where
the application logs
>>>>>> are moved
>>>>>> >> >> to
>>>>>> >> >> </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>    <property>
>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>> >> >>
>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>> >> >>     <description>the directories used by
Nodemanagers as log
>>>>>> >> >> directories</description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>> >> >>     <description>shuffle service that needs
to be set for Map
>>>>>> Reduce to
>>>>>> >> >> run </description>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>   <property>
>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>> >> >>     <value>64</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>> >> >>     <value>24</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >> <property>
>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>> >> >>     <value>3</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>> >> >>     <value>22000</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>  <property>
>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>> >> >>     <value>2.1</value>
>>>>>> >> >>   </property>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> 2013/6/7 Harsh J <harsh@cloudera.com>
>>>>>> >> >>>
>>>>>> >> >>> Not tuning configurations at all is wrong.
YARN uses memory
>>>>>> resource
>>>>>> >> >>> based scheduling and hence MR2 would be requesting
1 GB
>>>>>> minimum by
>>>>>> >> >>> default, causing, on base configs, to max out
at 8 (due to 8
>>>>>> GB NM
>>>>>> >> >>> memory resource config) total containers. Do
share your
>>>>>> configs as at
>>>>>> >> >>> this point none of us can tell what it is.
>>>>>> >> >>>
>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
for users and
>>>>>> to not
>>>>>> >> >>> care about such things :)
>>>>>> >> >>>
>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>> samliuhadoop@gmail.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>> > At the begining, I just want to do a fast
comparision of
>>>>>> MRv1 and
>>>>>> >> >>> > Yarn.
>>>>>> >> >>> > But
>>>>>> >> >>> > they have many differences, and to be
fair for comparison I
>>>>>> did not
>>>>>> >> >>> > tune
>>>>>> >> >>> > their configurations at all.  So I got
above test results.
>>>>>> After
>>>>>> >> >>> > analyzing
>>>>>> >> >>> > the test result, no doubt, I will configure
them and do
>>>>>> comparison
>>>>>> >> >>> > again.
>>>>>> >> >>> >
>>>>>> >> >>> > Do you have any idea on current test result?
I think, to
>>>>>> compare
>>>>>> >> >>> > with
>>>>>> >> >>> > MRv1,
>>>>>> >> >>> > Yarn is better on Map phase(teragen test),
but worse on
>>>>>> Reduce
>>>>>> >> >>> > phase(terasort test).
>>>>>> >> >>> > And any detailed suggestions/comments/materials
on Yarn
>>>>>> performance
>>>>>> >> >>> > tunning?
>>>>>> >> >>> >
>>>>>> >> >>> > Thanks!
>>>>>> >> >>> >
>>>>>> >> >>> >
>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>> marcosluis2186@gmail.com>
>>>>>> >> >>> >>
>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>> >> >>> >> Both frameworks have many areas to
tune:
>>>>>> >> >>> >> - Combiners, Shuffle optimization,
Block size, etc
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> 2013/6/6 sam liu <samliuhadoop@gmail.com>
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Hi Experts,
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> We are thinking about whether
to use Yarn or not in the
>>>>>> near
>>>>>> >> >>> >>> future,
>>>>>> >> >>> >>> and
>>>>>> >> >>> >>> I ran teragen/terasort on Yarn
and MRv1 for comprison.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> My env is three nodes cluster,
and each node has similar
>>>>>> hardware:
>>>>>> >> >>> >>> 2
>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn
and MRv1 cluster are set on
>>>>>> the
>>>>>> >> >>> >>> same
>>>>>> >> >>> >>> env. To
>>>>>> >> >>> >>> be fair, I did not make any performance
tuning on their
>>>>>> >> >>> >>> configurations, but
>>>>>> >> >>> >>> use the default configuration
values.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Before testing, I think Yarn will
be much better than
>>>>>> MRv1, if
>>>>>> >> >>> >>> they
>>>>>> >> >>> >>> all
>>>>>> >> >>> >>> use default configuration, because
Yarn is a better
>>>>>> framework than
>>>>>> >> >>> >>> MRv1.
>>>>>> >> >>> >>> However, the test result shows
some differences:
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>> >> >>> >>> Yarn is 2.8 times better than
MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> After a fast analysis, I think
the direct cause might be
>>>>>> that Yarn
>>>>>> >> >>> >>> is
>>>>>> >> >>> >>> much faster than MRv1 on Map phase,
but much worse on
>>>>>> Reduce
>>>>>> >> >>> >>> phase.
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Here I have two questions:
>>>>>> >> >>> >>> - Why my tests shows Yarn is worse
than MRv1 for terasort?
>>>>>> >> >>> >>> - What's the stratage for tuning
Yarn performance? Is any
>>>>>> >> >>> >>> materials?
>>>>>> >> >>> >>>
>>>>>> >> >>> >>> Thanks!
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >>
>>>>>> >> >>> >> --
>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>> >> >>> >>
>>>>>> >> >>> >
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Harsh J
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Harsh J
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message