hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jian Fang <jian.fang.subscr...@gmail.com>
Subject Re: Why my tests shows Yarn is worse than MRv1 for terasort?
Date Wed, 23 Oct 2013 00:50:29 GMT
Thanks Sandy. I will try to turn JVM resue off and see what happens.

Yes, I saw quite some exceptions in the task attempts. For instance.


2013-10-20 03:13:58,751 ERROR [main]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hadoop (auth:SIMPLE) cause:java.nio.channels.ClosedChannelException
2013-10-20 03:13:58,752 ERROR [Thread-6] org.apache.hadoop.hdfs.DFSClient:
Failed to close file
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/1-tb-data/_temporary/1/_temporary/attempt_1382237301855_0001_m_000200_1/part-m-00200:
File does not exist. Holder
DFSClient_attempt_1382237301855_0001_m_000200_1_872378586_1 does not have
any open files.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2737)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2801)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2783)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:611)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:429)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:48077)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:582)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
--
        at com.sun.proxy.$Proxy10.complete(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:371)
        at
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1910)
        at
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1896)
        at
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:773)
        at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:790)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:847)
        at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2526)
        at
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2551)
        at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2013-10-20 03:13:58,753 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.nio.channels.ClosedChannelException
        at
org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1325)
        at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98)
        at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:61)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:69)
        at
org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:57)
        at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:646)
        at
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at
org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)



On Tue, Oct 22, 2013 at 4:45 PM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:

> It looks like many of your reduce tasks were killed.  Do you know why?
>  Also, MR2 doesn't have JVM reuse, so it might make sense to compare it to
> MR1 with JVM reuse turned off.
>
> -Sandy
>
>
> On Tue, Oct 22, 2013 at 3:06 PM, Jian Fang <jian.fang.subscribe@gmail.com>wrote:
>
>> The Terasort output for MR2 is as follows.
>>
>> 2013-10-22 21:40:16,261 INFO org.apache.hadoop.mapreduce.Job (main):
>> Counters: 46
>>         File System Counters
>>                 FILE: Number of bytes read=456102049355
>>                 FILE: Number of bytes written=897246250517
>>                 FILE: Number of read operations=0
>>                 FILE: Number of large read operations=0
>>                 FILE: Number of write operations=0
>>                 HDFS: Number of bytes read=1000000851200
>>                 HDFS: Number of bytes written=1000000000000
>>                 HDFS: Number of read operations=32131
>>                 HDFS: Number of large read operations=0
>>                 HDFS: Number of write operations=224
>>         Job Counters
>>                 Killed map tasks=1
>>                 Killed reduce tasks=20
>>                 Launched map tasks=7601
>>                 Launched reduce tasks=132
>>                 Data-local map tasks=7591
>>                 Rack-local map tasks=10
>>                 Total time spent by all maps in occupied slots
>> (ms)=1696141311
>>                 Total time spent by all reduces in occupied slots
>> (ms)=2664045096
>>         Map-Reduce Framework
>>                 Map input records=10000000000
>>                 Map output records=10000000000
>>                 Map output bytes=1020000000000
>>                 Map output materialized bytes=440486356802
>>                 Input split bytes=851200
>>                 Combine input records=0
>>                 Combine output records=0
>>                 Reduce input groups=10000000000
>>                 Reduce shuffle bytes=440486356802
>>                 Reduce input records=10000000000
>>                 Reduce output records=10000000000
>>                 Spilled Records=20000000000
>>                 Shuffled Maps =851200
>>                 Failed Shuffles=61
>>                 Merged Map outputs=851200
>>                 GC time elapsed (ms)=4215666
>>                 CPU time spent (ms)=192433000
>>                 Physical memory (bytes) snapshot=3349356380160
>>                 Virtual memory (bytes) snapshot=9665208745984
>>                 Total committed heap usage (bytes)=3636854259712
>>         Shuffle Errors
>>                 BAD_ID=0
>>                 CONNECTION=0
>>                 IO_ERROR=4
>>                 WRONG_LENGTH=0
>>                 WRONG_MAP=0
>>                 WRONG_REDUCE=0
>>         File Input Format Counters
>>                 Bytes Read=1000000000000
>>         File Output Format Counters
>>                 Bytes Written=1000000000000
>>
>> Thanks,
>>
>> John
>>
>>
>>
>> On Tue, Oct 22, 2013 at 2:44 PM, Jian Fang <jian.fang.subscribe@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and
>>> it turned out that the terasort for MR2 is 2 times slower than that in MR1.
>>> I cannot really believe it.
>>>
>>> The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
>>> configurations are as follows.
>>>
>>>         mapreduce.map.java.opts = "-Xmx512m";
>>>         mapreduce.reduce.java.opts = "-Xmx1536m";
>>>         mapreduce.map.memory.mb = "768";
>>>         mapreduce.reduce.memory.mb = "2048";
>>>
>>>         yarn.scheduler.minimum-allocation-mb = "256";
>>>         yarn.scheduler.maximum-allocation-mb = "8192";
>>>         yarn.nodemanager.resource.memory-mb = "12288";
>>>         yarn.nodemanager.resource.cpu-vcores = "16";
>>>
>>>         mapreduce.reduce.shuffle.parallelcopies = "20";
>>>         mapreduce.task.io.sort.factor = "48";
>>>         mapreduce.task.io.sort.mb = "200";
>>>         mapreduce.map.speculative = "true";
>>>         mapreduce.reduce.speculative = "true";
>>>         mapreduce.framework.name = "yarn";
>>>         yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
>>>         mapreduce.map.cpu.vcores = "1";
>>>         mapreduce.reduce.cpu.vcores = "2";
>>>
>>>         mapreduce.job.jvm.numtasks = "20";
>>>         mapreduce.map.output.compress = "true";
>>>         mapreduce.map.output.compress.codec =
>>> "org.apache.hadoop.io.compress.SnappyCodec";
>>>
>>>         yarn.resourcemanager.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.client.thread-count = "64";
>>>         yarn.resourcemanager.resource-tracker.client.thread-count = "64";
>>>         yarn.resourcemanager.scheduler.class =
>>> "org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
>>>         yarn.nodemanager.aux-services = "mapreduce_shuffle";
>>>         yarn.nodemanager.aux-services.mapreduce.shuffle.class =
>>> "org.apache.hadoop.mapred.ShuffleHandler";
>>>         yarn.nodemanager.vmem-pmem-ratio = "5";
>>>         yarn.nodemanager.container-executor.class =
>>> "org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
>>>         yarn.nodemanager.container-manager.thread-count = "64";
>>>         yarn.nodemanager.localizer.client.thread-count = "20";
>>>         yarn.nodemanager.localizer.fetch.thread-count = "20";
>>>
>>> My Hadoop 1.0.3 has the same memory/disks/cores and almost the same
>>> other configurations. In MR1, the 1TB terasort took about 45 minutes, but
>>> it took around 90 minutes in MR2.
>>>
>>> Does anyone know what is wrong here? Or do I need some special
>>> configurations for terasort to work better in MR2?
>>>
>>> Thanks in advance,
>>>
>>> John
>>>
>>>
>>> On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <michael_segel@hotmail.com
>>> > wrote:
>>>
>>>> Sam,
>>>> I think your cluster is too small for any meaningful conclusions to be
>>>> made.
>>>>
>>>>
>>>> Sent from a remote device. Please excuse any typos...
>>>>
>>>> Mike Segel
>>>>
>>>> On Jun 18, 2013, at 3:58 AM, sam liu <samliuhadoop@gmail.com> wrote:
>>>>
>>>> Hi Harsh,
>>>>
>>>> Thanks for your detailed response! Now, the efficiency of my Yarn
>>>> cluster improved a lot after increasing the reducer
>>>> number(mapreduce.job.reduces) in mapred-site.xml. But I still have some
>>>> questions about the way of Yarn to execute MRv1 job:
>>>>
>>>> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
>>>> together, with a typical process(map > shuffle > reduce). In Yarn,
as I
>>>> know, a MRv1 job will be executed only by ApplicationMaster.
>>>> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job
>>>> has special execution process(map > shuffle > reduce) in Hadoop 1.x,
and
>>>> how Yarn execute a MRv1 job? still include some special MR steps in Hadoop
>>>> 1.x, like map, sort, merge, combine and shuffle?
>>>> - Do the MRv1 parameters still work for Yarn? Like
>>>> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
>>>> - What's the general process for ApplicationMaster of Yarn to execute a
>>>> job?
>>>>
>>>> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
>>>> 'mapred.tasktracker.map.tasks.maximum' and
>>>> 'mapred.tasktracker.reduce.tasks.maximum'
>>>> - For Yarn, above tow parameter do not work any more, as yarn uses
>>>> container instead, right?
>>>> - For Yarn, we can set the whole physical mem for a NodeManager using '
>>>> yarn.nodemanager.resource.memory-mb'. But how to set the default size
>>>> of physical mem of a container?
>>>> - How to set the maximum size of physical mem of a container? By the
>>>> parameter of 'mapred.child.java.opts'?
>>>>
>>>> Thanks as always!
>>>>
>>>> 2013/6/9 Harsh J <harsh@cloudera.com>
>>>>
>>>>> Hi Sam,
>>>>>
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due to a 22 GB memory?
>>>>>
>>>>> The MR2's default configuration requests 1 GB resource each for Map
>>>>> and Reduce containers. It requests 1.5 GB for the AM container that
>>>>> runs the job, additionally. This is tunable using the properties
>>>>> Sandy's mentioned in his post.
>>>>>
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to containers?
>>>>>
>>>>> This is a general question. You may use the same process you took to
>>>>> decide optimal number of slots in MR1 to decide this here. Every
>>>>> container is a new JVM, and you're limited by the CPUs you have there
>>>>> (if not the memory). Either increase memory requests from jobs, to
>>>>> lower # of concurrent containers at a given time (runtime change), or
>>>>> lower NM's published memory resources to control the same (config
>>>>> change).
>>>>>
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>>>>> framework? Like 'mapreduce.task.io.sort.mb' and
>>>>> 'mapreduce.map.sort.spill.percent'
>>>>>
>>>>> Yes, all of these properties will still work. Old properties specific
>>>>> to JobTracker or TaskTracker (usually found as a keyword in the config
>>>>> name) will not apply anymore.
>>>>>
>>>>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <samliuhadoop@gmail.com>
>>>>> wrote:
>>>>> > Hi Harsh,
>>>>> >
>>>>> > According to above suggestions, I removed the duplication of
>>>>> setting, and
>>>>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>>>>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>>>>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then,
>>>>> the
>>>>> > efficiency improved about 18%.  I have questions:
>>>>> >
>>>>> > - How to know the container number? Why you say it will be 22
>>>>> containers due
>>>>> > to a 22 GB memory?
>>>>> > - My machine has 32 GB memory, how many memory is proper to be
>>>>> assigned to
>>>>> > containers?
>>>>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>>>>> 'yarn', will
>>>>> > other parameters for mapred-site.xml still work in yarn framework?
>>>>> Like
>>>>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>>>>> >
>>>>> > Thanks!
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2013/6/8 Harsh J <harsh@cloudera.com>
>>>>> >>
>>>>> >> Hey Sam,
>>>>> >>
>>>>> >> Did you get a chance to retry with Sandy's suggestions? The
config
>>>>> >> appears to be asking NMs to use roughly 22 total containers
(as
>>>>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>>>>> >> resource. This could impact much, given the CPU is still the
same
>>>>> for
>>>>> >> both test runs.
>>>>> >>
>>>>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <
>>>>> sandy.ryza@cloudera.com>
>>>>> >> wrote:
>>>>> >> > Hey Sam,
>>>>> >> >
>>>>> >> > Thanks for sharing your results.  I'm definitely curious
about
>>>>> what's
>>>>> >> > causing the difference.
>>>>> >> >
>>>>> >> > A couple observations:
>>>>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb
in
>>>>> there
>>>>> >> > twice
>>>>> >> > with two different values.
>>>>> >> >
>>>>> >> > Your max JVM memory of 1000 MB is (dangerously?) close
to the
>>>>> default
>>>>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your
tasks
>>>>> getting
>>>>> >> > killed for running over resource limits?
>>>>> >> >
>>>>> >> > -Sandy
>>>>> >> >
>>>>> >> >
>>>>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <samliuhadoop@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> The terasort execution log shows that reduce spent
about 5.5
>>>>> mins from
>>>>> >> >> 33%
>>>>> >> >> to 35% as below.
>>>>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce
31%
>>>>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce
32%
>>>>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce
33%
>>>>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce
35%
>>>>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce
40%
>>>>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce
43%
>>>>> >> >>
>>>>> >> >> Any way, below are my configurations for your reference.
Thanks!
>>>>> >> >> (A) core-site.xml
>>>>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>>>>> >> >>
>>>>> >> >> (B) hdfs-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.replication</name>
>>>>> >> >>     <value>1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.name.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.data.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.block.size</name>
>>>>> >> >>     <value>134217728</value><!-- 128MB
-->
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.namenode.handler.count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>dfs.datanode.handler.count</name>
>>>>> >> >>     <value>10</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (C) mapred-site.xml
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.cluster.local.dir</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>>>>> >> >>     <description>No description</description>
>>>>> >> >>     <final>true</final>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>   <name>mapreduce.child.java.opts</name>
>>>>> >> >>   <value>-Xmx1000m</value>
>>>>> >> >> </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>mapreduce.framework.name</name>
>>>>> >> >>     <value>yarn</value>
>>>>> >> >>    </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>>>>> >> >>     <value>8</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>>>>> >> >>     <value>4</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>>>>> >> >>     <value>true</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> (D) yarn-site.xml
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>>>>> >> >>     <value>node1:18025</value>
>>>>> >> >>     <description>host is the hostname of the
resource manager and
>>>>> >> >>     port is the port on which the NodeManagers contact
the
>>>>> Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <description>The address of the RM web
>>>>> application.</description>
>>>>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>>>>> >> >>     <value>node1:18088</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>>>>> >> >>     <value>node1:18030</value>
>>>>> >> >>     <description>host is the hostname of the
resourcemanager and
>>>>> port
>>>>> >> >> is
>>>>> >> >> the port
>>>>> >> >>     on which the Applications in the cluster talk to
the Resource
>>>>> >> >> Manager.
>>>>> >> >>     </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.address</name>
>>>>> >> >>     <value>node1:18040</value>
>>>>> >> >>     <description>the host is the hostname of
the ResourceManager
>>>>> and
>>>>> >> >> the
>>>>> >> >> port is the port on
>>>>> >> >>     which the clients can talk to the Resource Manager.
>>>>> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>>>>> >> >>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>>>>> >> >>     <description>the local directories used by
the
>>>>> >> >> nodemanager</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.address</name>
>>>>> >> >>     <value>0.0.0.0:18050</value>
>>>>> >> >>     <description>the nodemanagers bind to this
port</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>10240</value>
>>>>> >> >>     <description>the amount of memory on the
NodeManager in
>>>>> >> >> GB</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>>>>> >> >>     <description>directory on hdfs where the
application logs
>>>>> are moved
>>>>> >> >> to
>>>>> >> >> </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>    <property>
>>>>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>>>>> >> >>
>>>>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>>>>> >> >>     <description>the directories used by Nodemanagers
as log
>>>>> >> >> directories</description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.nodemanager.aux-services</name>
>>>>> >> >>     <value>mapreduce.shuffle</value>
>>>>> >> >>     <description>shuffle service that needs to
be set for Map
>>>>> Reduce to
>>>>> >> >> run </description>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>   <property>
>>>>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>>>>> >> >>     <value>64</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>>>>> >> >>     <value>24</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >> <property>
>>>>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>>>>> >> >>     <value>3</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>>>>> >> >>     <value>22000</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>  <property>
>>>>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>>>>> >> >>     <value>2.1</value>
>>>>> >> >>   </property>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> 2013/6/7 Harsh J <harsh@cloudera.com>
>>>>> >> >>>
>>>>> >> >>> Not tuning configurations at all is wrong. YARN
uses memory
>>>>> resource
>>>>> >> >>> based scheduling and hence MR2 would be requesting
1 GB minimum
>>>>> by
>>>>> >> >>> default, causing, on base configs, to max out at
8 (due to 8 GB
>>>>> NM
>>>>> >> >>> memory resource config) total containers. Do share
your configs
>>>>> as at
>>>>> >> >>> this point none of us can tell what it is.
>>>>> >> >>>
>>>>> >> >>> Obviously, it isn't our goal to make MR2 slower
for users and
>>>>> to not
>>>>> >> >>> care about such things :)
>>>>> >> >>>
>>>>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <samliuhadoop@gmail.com
>>>>> >
>>>>> >> >>> wrote:
>>>>> >> >>> > At the begining, I just want to do a fast
comparision of MRv1
>>>>> and
>>>>> >> >>> > Yarn.
>>>>> >> >>> > But
>>>>> >> >>> > they have many differences, and to be fair
for comparison I
>>>>> did not
>>>>> >> >>> > tune
>>>>> >> >>> > their configurations at all.  So I got above
test results.
>>>>> After
>>>>> >> >>> > analyzing
>>>>> >> >>> > the test result, no doubt, I will configure
them and do
>>>>> comparison
>>>>> >> >>> > again.
>>>>> >> >>> >
>>>>> >> >>> > Do you have any idea on current test result?
I think, to
>>>>> compare
>>>>> >> >>> > with
>>>>> >> >>> > MRv1,
>>>>> >> >>> > Yarn is better on Map phase(teragen test),
but worse on Reduce
>>>>> >> >>> > phase(terasort test).
>>>>> >> >>> > And any detailed suggestions/comments/materials
on Yarn
>>>>> performance
>>>>> >> >>> > tunning?
>>>>> >> >>> >
>>>>> >> >>> > Thanks!
>>>>> >> >>> >
>>>>> >> >>> >
>>>>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <
>>>>> marcosluis2186@gmail.com>
>>>>> >> >>> >>
>>>>> >> >>> >> Why not to tune the configurations?
>>>>> >> >>> >> Both frameworks have many areas to tune:
>>>>> >> >>> >> - Combiners, Shuffle optimization, Block
size, etc
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> 2013/6/6 sam liu <samliuhadoop@gmail.com>
>>>>> >> >>> >>>
>>>>> >> >>> >>> Hi Experts,
>>>>> >> >>> >>>
>>>>> >> >>> >>> We are thinking about whether to use
Yarn or not in the near
>>>>> >> >>> >>> future,
>>>>> >> >>> >>> and
>>>>> >> >>> >>> I ran teragen/terasort on Yarn and
MRv1 for comprison.
>>>>> >> >>> >>>
>>>>> >> >>> >>> My env is three nodes cluster, and
each node has similar
>>>>> hardware:
>>>>> >> >>> >>> 2
>>>>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and
MRv1 cluster are set on
>>>>> the
>>>>> >> >>> >>> same
>>>>> >> >>> >>> env. To
>>>>> >> >>> >>> be fair, I did not make any performance
tuning on their
>>>>> >> >>> >>> configurations, but
>>>>> >> >>> >>> use the default configuration values.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Before testing, I think Yarn will
be much better than MRv1,
>>>>> if
>>>>> >> >>> >>> they
>>>>> >> >>> >>> all
>>>>> >> >>> >>> use default configuration, because
Yarn is a better
>>>>> framework than
>>>>> >> >>> >>> MRv1.
>>>>> >> >>> >>> However, the test result shows some
differences:
>>>>> >> >>> >>>
>>>>> >> >>> >>> MRv1: Hadoop-1.1.1
>>>>> >> >>> >>> Yarn: Hadoop-2.0.4
>>>>> >> >>> >>>
>>>>> >> >>> >>> (A) Teragen: generate 10 GB data:
>>>>> >> >>> >>> - MRv1: 193 sec
>>>>> >> >>> >>> - Yarn: 69 sec
>>>>> >> >>> >>> Yarn is 2.8 times better than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> (B) Terasort: sort 10 GB data:
>>>>> >> >>> >>> - MRv1: 451 sec
>>>>> >> >>> >>> - Yarn: 1136 sec
>>>>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>>>>> >> >>> >>>
>>>>> >> >>> >>> After a fast analysis, I think the
direct cause might be
>>>>> that Yarn
>>>>> >> >>> >>> is
>>>>> >> >>> >>> much faster than MRv1 on Map phase,
but much worse on Reduce
>>>>> >> >>> >>> phase.
>>>>> >> >>> >>>
>>>>> >> >>> >>> Here I have two questions:
>>>>> >> >>> >>> - Why my tests shows Yarn is worse
than MRv1 for terasort?
>>>>> >> >>> >>> - What's the stratage for tuning Yarn
performance? Is any
>>>>> >> >>> >>> materials?
>>>>> >> >>> >>>
>>>>> >> >>> >>> Thanks!
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >>
>>>>> >> >>> >> --
>>>>> >> >>> >> Marcos Ortiz Valmaseda
>>>>> >> >>> >> Product Manager at PDVSA
>>>>> >> >>> >> http://about.me/marcosortiz
>>>>> >> >>> >>
>>>>> >> >>> >
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Harsh J
>>>>> >> >>
>>>>> >> >>
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Harsh J
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message