hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Date Wed, 23 Oct 2013 20:17:39 GMT
Increasing the slowstart is not meant to increase performance, but should
make for a fairer comparison.  Have you tried making sure that in MR2 only
8 map tasks are running concurrently, or boosting MR1 up to 16?

-Sandy


On Wed, Oct 23, 2013 at 12:55 PM, Jian Fang
<jian.fang.subscribe@gmail.com>wrote:

> Changing mapreduce.job.reduce.
> slowstart.completedmaps to 0.99 does not look good. The map phase alone
> took 48 minutes and total time seems to be even longer. Any way to let map
> phase run faster?
>
> Thanks.
>
>
> On Wed, Oct 23, 2013 at 10:05 AM, Jian Fang <jian.fang.subscribe@gmail.com
> > wrote:
>
>> Thanks Sandy.
>>
>> io.sort.record.percent is the default value 0.05 for both MR1 and MR2.
>> mapreduce.job.reduce.slowstart.completedmaps in MR2 and mapred.reduce.slowstart.completed.maps
>> in MR1 both use the default value 0.05.
>>
>> I tried to allocate 1536MB and 1024MB to map container some time ago, but
>> the changes did not give me a better result, thus, I changed it back to
>> 768MB.
>>
>> Will try mapred.reduce.slowstart.completed.maps=.99 to see what happens.
>> BTW, I should use
>> mapreduce.job.reduce.slowstart.completedmaps in MR2, right?
>>
>> Also, in MR1 I can specify tasktracker.http.threads, but I could not find
>> the counterpart for MR2. Which one I should tune for the http thread?
>>
>> Thanks again.
>>
>>
>> On Wed, Oct 23, 2013 at 9:40 AM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:
>>
>>> Based on SLOTS_MILLIS_MAPS, it looks like your map tasks are taking
>>> about three times as long in MR2 as they are in MR1.  This is probably
>>> because you allow twice as many map tasks to run at a time in MR2 (12288/768
>>> = 16).  Being able to use all the containers isn't necessarily a good thing
>>> if you are oversubscribing your node's resources.  Because of the different
>>> way that MR1 and MR2 view resources, I think it's better to test with
>>> mapred.reduce.slowstart.completed.maps=.99 so that the map and reduce
>>> phases will run separately.
>>>
>>> On the other side, it looks like your MR1 has more spilled records than
>>> MR2.  For a fairer comparison, you should set io.sort.record.percent in MR1
>>> to .13, which should improve MR1 performance, but will provide a fairer
>>> comparison (MR2 automatically does this tuning for you).
>>>
>>> -Sandy
>>>
>>>
>>> On Wed, Oct 23, 2013 at 9:22 AM, Jian Fang <
>>> jian.fang.subscribe@gmail.com> wrote:
>>>
>>>> The number of map slots and reduce slots on each data node for MR1 are
>>>> 8 and 3, respectively. Since MR2 could use all containers for either map or
>>>> reduce, I would expect that MR2 is faster.
>>>>
>>>>
>>>> On Wed, Oct 23, 2013 at 8:17 AM, Sandy Ryza <sandy.ryza@cloudera.com>wrote:
>>>>
>>>>> How many map and reduce slots are you using per tasktracker in MR1?
>>>>>  How do the average map times compare? (MR2 reports this directly on the
>>>>> web UI, but you can also get a sense in MR1 by scrolling through the map
>>>>> tasks page).  Can you share the counters for MR1?
>>>>>
>>>>> -Sandy
>>>>>
>>>>>
>>>>> On Wed, Oct 23, 2013 at 12:23 AM, Jian Fang <
>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>
>>>>>> Unfortunately, turning off JVM reuse still got the same result, i.e.,
>>>>>> about 90 minutes for MR2. I don't think the killed reduces could contribute
>>>>>> to 2 times slowness. There should be something very wrong either in
>>>>>> configuration or code. Any hints?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 22, 2013 at 5:50 PM, Jian Fang <
>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Sandy. I will try to turn JVM resue off and see what happens.
>>>>>>>
>>>>>>> Yes, I saw quite some exceptions in the task attempts. For instance.
>>>>>>>
>>>>>>>
>>>>>>> 2013-10-20 03:13:58,751 ERROR [main]
>>>>>>> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
>>>>>>> as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
>>>>>>> 2013-10-20 03:13:58,752 ERROR [Thread-6]
>>>>>>> org.apache.hadoop.hdfs.DFSClient: Failed to close file
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
>>>>>>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>>>>>>> No lease on
>>>>>>> /1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
>>>>>>> File does not exist. Holder
>>>>>>> DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
>>>>>>> any open files.
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
>>>>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
>>>>>>>         at
>>>>>>> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
>>>>>>> --
>>>>>>>         at com.sun.proxy.$Proxy10.complete(Unknown Source)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
>>>>>>>         at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
>>>>>>> 2013-10-20 03:13:58,753 WARN [main]
>>>>>>> org.apache.hadoop.mapred.YarnChild: Exception running child :
>>>>>>> java.nio.channels.ClosedChannelException
>>>>>>>         at
>>>>>>> org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
>>>>>>>         at
>>>>>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
>>>>>>>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>>>>>>>         at
>>>>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>>>>>>>         at
>>>>>>> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> It looks like many of your reduce tasks were killed.  Do you know
>>>>>>>> why?  Also, MR2 doesn't have JVM reuse, so it might make sense to compare
>>>>>>>> it to MR1 with JVM reuse turned off.
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <
>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The Terasort output for MR2 is as follows.
>>>>>>>>>
>>>>>>>>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job
>>>>>>>>> (main): Counters: 46
>>>>>>>>>         File System Counters
>>>>>>>>>                 FILE: Number of bytes read=456102049355
>>>>>>>>>                 FILE: Number of bytes written=897246250517
>>>>>>>>>                 FILE: Number of read operations=0
>>>>>>>>>                 FILE: Number of large read operations=0
>>>>>>>>>                 FILE: Number of write operations=0
>>>>>>>>>                 HDFS: Number of bytes read=1000000851200
>>>>>>>>>                 HDFS: Number of bytes written=1000000000000
>>>>>>>>>                 HDFS: Number of read operations=32131
>>>>>>>>>                 HDFS: Number of large read operations=0
>>>>>>>>>                 HDFS: Number of write operations=224
>>>>>>>>>         Job Counters
>>>>>>>>>                 Killed map tasks=1
>>>>>>>>>                 Killed reduce tasks=20
>>>>>>>>>                 Launched map tasks=7601
>>>>>>>>>                 Launched reduce tasks=132
>>>>>>>>>                 Data-local map tasks=7591
>>>>>>>>>                 Rack-local map tasks=10
>>>>>>>>>                 Total time spent by all maps in occupied slots
>>>>>>>>> (ms)=1696141311
>>>>>>>>>                 Total time spent by all reduces in occupied slots
>>>>>>>>> (ms)=2664045096
>>>>>>>>>         Map-Reduce Framework
>>>>>>>>>                 Map input records=10000000000
>>>>>>>>>                 Map output records=10000000000
>>>>>>>>>                 Map output bytes=1020000000000
>>>>>>>>>                 Map output materialized bytes=440486356802
>>>>>>>>>                 Input split bytes=851200
>>>>>>>>>                 Combine input records=0
>>>>>>>>>                 Combine output records=0
>>>>>>>>>                 Reduce input groups=10000000000
>>>>>>>>>                 Reduce shuffle bytes=440486356802
>>>>>>>>>                 Reduce input records=10000000000
>>>>>>>>>                 Reduce output records=10000000000
>>>>>>>>>                 Spilled Records=20000000000
>>>>>>>>>                 Shuffled Maps =851200
>>>>>>>>>                 Failed Shuffles=61
>>>>>>>>>                 Merged Map outputs=851200
>>>>>>>>>                 GC time elapsed (ms)=4215666
>>>>>>>>>                 CPU time spent (ms)=192433000
>>>>>>>>>                 Physical memory (bytes) snapshot=3349356380160
>>>>>>>>>                 Virtual memory (bytes) snapshot=9665208745984
>>>>>>>>>                 Total committed heap usage (bytes)=3636854259712
>>>>>>>>>         Shuffle Errors
>>>>>>>>>                 BAD_ID=0
>>>>>>>>>                 CONNECTION=0
>>>>>>>>>                 IO_ERROR=4
>>>>>>>>>                 WRONG_LENGTH=0
>>>>>>>>>                 WRONG_MAP=0
>>>>>>>>>                 WRONG_REDUCE=0
>>>>>>>>>         File Input Format Counters
>>>>>>>>>                 Bytes Read=1000000000000
>>>>>>>>>         File Output Format Counters
>>>>>>>>>                 Bytes Written=1000000000000
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <
>>>>>>>>> jian.fang.subscribe@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop
>>>>>>>>>> 1.0.3 and it turned out that the terasort for MR2 is 2 times slower than
>>>>>>>>>> that in MR1. I cannot really believe it.
>>>>>>>>>>
>>>>>>>>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0
>>>>>>>>>> cluster configurations are as follows.
>>>>>>>>>>
>>>>>>>>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>>>>>>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>>>>>>>>         mapreduce.map.memory.mb = "768";
>>>>>>>>>>         mapreduce.reduce.memory.mb = "2048";
>>>>>>>>>>
>>>>>>>>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>>>>>>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>>>>>>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>>>>>>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>>>>>>>>         mapreduce.task.io.sort.factor = "48";
>>>>>>>>>>         mapreduce.task.io.sort.mb = "200";
>>>>>>>>>>         mapreduce.map.speculative = "true";
>>>>>>>>>>         mapreduce.reduce.speculative = "true";
>>>>>>>>>>         mapreduce.framework.name = "yarn";
>>>>>>>>>>         yarn.app.mapreduce.am.job.task.listener.thread-count =
>>>>>>>>>> "60";
>>>>>>>>>>         mapreduce.map.cpu.vcores = "1";
>>>>>>>>>>         mapreduce.reduce.cpu.vcores = "2";
>>>>>>>>>>
>>>>>>>>>>         mapreduce.job.jvm.numtasks = "20";
>>>>>>>>>>         mapreduce.map.output.compress = "true";
>>>>>>>>>>         mapreduce.map.output.compress.codec =
>>>>>>>>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>>>>>>>>
>>>>>>>>>>         yarn.resourcemanager.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>>>>>>>>         yarn.resourcemanager.resource-tracker.client.thread-count
>>>>>>>>>> = "64";
>>>>>>>>>>         yarn.resourcemanager.scheduler.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>>>>>>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>>>>>>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>>>>>>>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>>>>>>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>>>>>>>>         yarn.nodemanager.container-executor.class =
>>>>>>>>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>>>>>>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>>>>>>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>>>>>>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>>>>>>>>
>>>>>>>>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the
>>>>>>>>>> same other configurations. In MR1, the 1TB terasort took about 45 minutes,
>>>>>>>>>> but it took around 90 minutes in MR2.
>>>>>>>>>>
>>>>>>>>>> Does anyone know what is wrong here? Or do I need some special
>>>>>>>>>> configurations for terasort to work better in MR2?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <
>>>>>>>>>> michael_segel@hotmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sam,
>>>>>>>>>>> I think your cluster is too small for any meaningful conclusions
>>>>>>>>>>> to be made.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sent from a remote device. Please excuse any typos...
>>>>>>>>>>>
>>>>>>>>>>> Mike Segel
>>>>>>>>>>>
>>>>>>>>>>> On Jun 18, 2013, at 3:58 AM, sam liu <samliuhadoop@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Harsh,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your detailed response! Now, the efficiency of my
>>>>>>>>>>> Yarn cluster improved a lot after increasing the reducer
>>>>>>>>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>>>>>>>>> questions about the way of Yarn to execute MRv1 job:
>>>>>>>>>>>
>>>>>>>>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce
>>>>>>>>>>> task together, with a typical process(map > shuffle > reduce). In Yarn, as
>>>>>>>>>>> I know, a MRv1 job will be executed only by ApplicationMaster.
>>>>>>>>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1
>>>>>>>>>>> job has special execution process(map > shuffle > reduce) in Hadoop 1.x,
>>>>>>>>>>> and how Yarn execute a MRv1 job? still include some special MR steps in
>>>>>>>>>>> Hadoop 1.x, like map, sort, merge, combine and shuffle?
>>>>>>>>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>>>>>>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>>>>>>>>> - What's the general process for ApplicationMaster of Yarn to
>>>>>>>>>>> execute a job?
>>>>>>>>>>>
>>>>>>>>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>>>>>>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>>>>>>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>>>>>>>>> - For Yarn, above tow parameter do not work any more, as yarn
>>>>>>>>>>> uses container instead, right?
>>>>>>>>>>> - For Yarn, we can set the whole physical mem for a NodeManager
>>>>>>>>>>> using 'yarn.nodemanager.resource.memory-mb'. But how to set the
>>>>>>>>>>> default size of physical mem of a container?
>>>>>>>>>>> - How to set the maximum size of physical mem of a container? By
>>>>>>>>>>> the parameter of 'mapred.child.java.opts'?
>>>>>>>>>>>
>>>>>>>>>>> Thanks as always!
>>>>>>>>>>>
>>>>>>>>>>> 2013/6/9 Harsh J <harsh@cloudera.com>
>>>>>>>>>>>
>>>>>>>>>>>> Hi Sam,
>>>>>>>>>>>>
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due to a 22 GB memory?
>>>>>>>>>>>>
>>>>>>>>>>>> The MR2's default configuration requests 1 GB resource each for
>>>>>>>>>>>> Map
>>>>>>>>>>>> and Reduce containers. It requests 1.5 GB for the AM container
>>>>>>>>>>>> that
>>>>>>>>>>>> runs the job, additionally. This is tunable using the properties
>>>>>>>>>>>> Sandy's mentioned in his post.
>>>>>>>>>>>>
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to containers?
>>>>>>>>>>>>
>>>>>>>>>>>> This is a general question. You may use the same process you
>>>>>>>>>>>> took to
>>>>>>>>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>>>>>>>>> container is a new JVM, and you're limited by the CPUs you have
>>>>>>>>>>>> there
>>>>>>>>>>>> (if not the memory). Either increase memory requests from jobs,
>>>>>>>>>>>> to
>>>>>>>>>>>> lower # of concurrent containers at a given time (runtime
>>>>>>>>>>>> change), or
>>>>>>>>>>>> lower NM's published memory resources to control the same
>>>>>>>>>>>> (config
>>>>>>>>>>>> change).
>>>>>>>>>>>>
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, all of these properties will still work. Old properties
>>>>>>>>>>>> specific
>>>>>>>>>>>> to JobTracker or TaskTracker (usually found as a keyword in the
>>>>>>>>>>>> config
>>>>>>>>>>>> name) will not apply anymore.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <samliuhadoop@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> > Hi Harsh,
>>>>>>>>>>>> >
>>>>>>>>>>>> > According to above suggestions, I removed the duplication of
>>>>>>>>>>>> setting, and
>>>>>>>>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>>>>>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>>>>>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant
>>>>>>>>>>>> then, the
>>>>>>>>>>>> > efficiency improved about 18%.  I have questions:
>>>>>>>>>>>> >
>>>>>>>>>>>> > - How to know the container number? Why you say it will be 22
>>>>>>>>>>>> containers due
>>>>>>>>>>>> > to a 22 GB memory?
>>>>>>>>>>>> > - My machine has 32 GB memory, how many memory is proper to
>>>>>>>>>>>> be assigned to
>>>>>>>>>>>> > containers?
>>>>>>>>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to
>>>>>>>>>>>> be 'yarn', will
>>>>>>>>>>>> > other parameters for mapred-site.xml still work in yarn
>>>>>>>>>>>> framework? Like
>>>>>>>>>>>> > 'mapreduce.task.io.sort.mb' and
>>>>>>>>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks!
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > 2013/6/8 Harsh J <harsh@cloudera.com>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Hey Sam,
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
>>>>>>>>>>>> config
>>>>>>>>>>>> >> appears to be asking NMs to use roughly 22 total containers
>>>>>>>>>>>> (as
>>>>>>>>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB
>>>>>>>>>>>> memory
>>>>>>>>>>>> >> resource. This could impact much, given the CPU is still the
>>>>>>>>>>>> same for
>>>>>>>>>>>> >> both test runs.
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>>>>>>>>> sandy.ryza@cloudera.com>
>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>> >> > Hey Sam,
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Thanks for sharing your results.  I'm definitely curious
>>>>>>>>>>>> about what's
>>>>>>>>>>>> >> > causing the difference.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > A couple observations:
>>>>>>>>>>>> >> > It looks like you've got
>>>>>>>>>>>> yarn.nodemanager.resource.memory-mb in there
>>>>>>>>>>>> >> > twice
>>>>>>>>>>>> >> > with two different values.
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to
>>>>>>>>>>>> the default
>>>>>>>>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
>>>>>>>>>>>> tasks getting
>>>>>>>>>>>> >> > killed for running over resource limits?
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > -Sandy
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com> wrote:
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> The terasort execution log shows that reduce spent about
>>>>>>>>>>>> 5.5 mins from
>>>>>>>>>>>> >> >> 33%
>>>>>>>>>>>> >> >> to 35% as below.
>>>>>>>>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>>>>>>>>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>>>>>>>>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>>>>>>>>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>>>>>>>>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>>>>>>>>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> Any way, below are my configurations for your reference.
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> >> >> (A) core-site.xml
>>>>>>>>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (B) hdfs-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.replication</name>
>>>>>>>>>>>> >> >>     <value>1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.name.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.data.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.block.size</name>
>>>>>>>>>>>> >> >>     <value>134217728</value><!-- 128MB -->
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>>>>>>>>> >> >>     <value>10</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (C) mapred-site.xml
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>>>>>>>>> >> >>     <description>No description</description>
>>>>>>>>>>>> >> >>     <final>true</final>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>>>>>>>>> >> >>   <value>-Xmx1000m</value>
>>>>>>>>>>>> >> >> </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>>>>>>>>> >> >>     <value>yarn</value>
>>>>>>>>>>>> >> >>    </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>8</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>>>>>>>>> >> >>     <value>4</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>>>>>>>>> >> >>     <value>true</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> (D) yarn-site.xml
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18025</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the resource
>>>>>>>>>>>> manager and
>>>>>>>>>>>> >> >>     port is the port on which the NodeManagers contact
>>>>>>>>>>>> the Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <description>The address of the RM web
>>>>>>>>>>>> application.</description>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18088</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18030</value>
>>>>>>>>>>>> >> >>     <description>host is the hostname of the
>>>>>>>>>>>> resourcemanager and port
>>>>>>>>>>>> >> >> is
>>>>>>>>>>>> >> >> the port
>>>>>>>>>>>> >> >>     on which the Applications in the cluster talk to the
>>>>>>>>>>>> Resource
>>>>>>>>>>>> >> >> Manager.
>>>>>>>>>>>> >> >>     </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>>>>>>>>> >> >>     <value>node1:18040</value>
>>>>>>>>>>>> >> >>     <description>the host is the hostname of the
>>>>>>>>>>>> ResourceManager and
>>>>>>>>>>>> >> >> the
>>>>>>>>>>>> >> >> port is the port on
>>>>>>>>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>>>>>>>>> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>>>>>>>>> >> >>     <description>the local directories used by the
>>>>>>>>>>>> >> >> nodemanager</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>>>>>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>>>>>>>>> >> >>     <description>the nodemanagers bind to this
>>>>>>>>>>>> port</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>10240</value>
>>>>>>>>>>>> >> >>     <description>the amount of memory on the NodeManager
>>>>>>>>>>>> in
>>>>>>>>>>>> >> >> GB</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>>>>>>>>> >> >>     <description>directory on hdfs where the application
>>>>>>>>>>>> logs are moved
>>>>>>>>>>>> >> >> to
>>>>>>>>>>>> >> >> </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>    <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>>>>>>>>> >> >>     <description>the directories used by Nodemanagers as
>>>>>>>>>>>> log
>>>>>>>>>>>> >> >> directories</description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>>>>>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>>>>>>>>> >> >>     <description>shuffle service that needs to be set for
>>>>>>>>>>>> Map Reduce to
>>>>>>>>>>>> >> >> run </description>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>   <property>
>>>>>>>>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>>>>>>>>> >> >>     <value>64</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>>>>>>>>> >> >>     <value>24</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>>>>>>>>> >> >>     <value>3</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>>>>>>>>> >> >>     <value>22000</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>  <property>
>>>>>>>>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>>>>>>>>> >> >>     <value>2.1</value>
>>>>>>>>>>>> >> >>   </property>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >> 2013/6/7 Harsh J <harsh@cloudera.com>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Not tuning configurations at all is wrong. YARN uses
>>>>>>>>>>>> memory resource
>>>>>>>>>>>> >> >>> based scheduling and hence MR2 would be requesting 1 GB
>>>>>>>>>>>> minimum by
>>>>>>>>>>>> >> >>> default, causing, on base configs, to max out at 8 (due
>>>>>>>>>>>> to 8 GB NM
>>>>>>>>>>>> >> >>> memory resource config) total containers. Do share your
>>>>>>>>>>>> configs as at
>>>>>>>>>>>> >> >>> this point none of us can tell what it is.
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower for
>>>>>>>>>>>> users and to not
>>>>>>>>>>>> >> >>> care about such things :)
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <
>>>>>>>>>>>> samliuhadoop@gmail.com>
>>>>>>>>>>>> >> >>> wrote:
>>>>>>>>>>>> >> >>> > At the begining, I just want to do a fast comparision
>>>>>>>>>>>> of MRv1 and
>>>>>>>>>>>> >> >>> > Yarn.
>>>>>>>>>>>> >> >>> > But
>>>>>>>>>>>> >> >>> > they have many differences, and to be fair for
>>>>>>>>>>>> comparison I did not
>>>>>>>>>>>> >> >>> > tune
>>>>>>>>>>>> >> >>> > their configurations at all.  So I got above test
>>>>>>>>>>>> results. After
>>>>>>>>>>>> >> >>> > analyzing
>>>>>>>>>>>> >> >>> > the test result, no doubt, I will configure them and
>>>>>>>>>>>> do comparison
>>>>>>>>>>>> >> >>> > again.
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Do you have any idea on current test result? I think,
>>>>>>>>>>>> to compare
>>>>>>>>>>>> >> >>> > with
>>>>>>>>>>>> >> >>> > MRv1,
>>>>>>>>>>>> >> >>> > Yarn is better on Map phase(teragen test), but worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> > phase(terasort test).
>>>>>>>>>>>> >> >>> > And any detailed suggestions/comments/materials on
>>>>>>>>>>>> Yarn performance
>>>>>>>>>>>> >> >>> > tunning?
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > Thanks!
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>>>>>>>>> marcosluis2186@gmail.com>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> Why not to tune the configurations?
>>>>>>>>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>>>>>>>>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> 2013/6/6 sam liu <samliuhadoop@gmail.com>
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Hi Experts,
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> We are thinking about whether to use Yarn or not in
>>>>>>>>>>>> the near
>>>>>>>>>>>> >> >>> >>> future,
>>>>>>>>>>>> >> >>> >>> and
>>>>>>>>>>>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for
>>>>>>>>>>>> comprison.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> My env is three nodes cluster, and each node has
>>>>>>>>>>>> similar hardware:
>>>>>>>>>>>> >> >>> >>> 2
>>>>>>>>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are
>>>>>>>>>>>> set on the
>>>>>>>>>>>> >> >>> >>> same
>>>>>>>>>>>> >> >>> >>> env. To
>>>>>>>>>>>> >> >>> >>> be fair, I did not make any performance tuning on
>>>>>>>>>>>> their
>>>>>>>>>>>> >> >>> >>> configurations, but
>>>>>>>>>>>> >> >>> >>> use the default configuration values.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Before testing, I think Yarn will be much better
>>>>>>>>>>>> than MRv1, if
>>>>>>>>>>>> >> >>> >>> they
>>>>>>>>>>>> >> >>> >>> all
>>>>>>>>>>>> >> >>> >>> use default configuration, because Yarn is a better
>>>>>>>>>>>> framework than
>>>>>>>>>>>> >> >>> >>> MRv1.
>>>>>>>>>>>> >> >>> >>> However, the test result shows some differences:
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>>>>>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 193 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 69 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>>>>>>>>> >> >>> >>> - MRv1: 451 sec
>>>>>>>>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>>>>>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> After a fast analysis, I think the direct cause
>>>>>>>>>>>> might be that Yarn
>>>>>>>>>>>> >> >>> >>> is
>>>>>>>>>>>> >> >>> >>> much faster than MRv1 on Map phase, but much worse
>>>>>>>>>>>> on Reduce
>>>>>>>>>>>> >> >>> >>> phase.
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Here I have two questions:
>>>>>>>>>>>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for
>>>>>>>>>>>> terasort?
>>>>>>>>>>>> >> >>> >>> - What's the stratage for tuning Yarn performance?
>>>>>>>>>>>> Is any
>>>>>>>>>>>> >> >>> >>> materials?
>>>>>>>>>>>> >> >>> >>>
>>>>>>>>>>>> >> >>> >>> Thanks!
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >> --
>>>>>>>>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>>>>>>>>> >> >>> >> Product Manager at PDVSA
>>>>>>>>>>>> >> >>> >> http://about.me/marcosortiz
>>>>>>>>>>>> >> >>> >>
>>>>>>>>>>>> >> >>> >
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>>
>>>>>>>>>>>> >> >>> --
>>>>>>>>>>>> >> >>> Harsh J
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >>
>>>>>>>>>>>> >> >
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> --
>>>>>>>>>>>> >> Harsh J
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Harsh J
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message