hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GOEKE, MATTHEW (AG/1000)" <matthew.go...@monsanto.com>
Subject RE: Performance Tunning
Date Tue, 28 Jun 2011 04:11:56 GMT
Per node: 4 cores * 2 processes = 8 slots
Datanode: 1 slot
Tasktracker: 1 slot

Therefore max of 6 slots between mappers and reducers.

Below is part of our mapred-site.xml. The thing to keep in mind is the number of maps is defined
by the number of input splits (which is defined by your data) so you only need to worry about
setting the maximum number of concurrent processes per node. In this case the property you
want to hone in on is mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum.
Keep in mind there are a LOT of other tuning improvements that can be made but it requires
an strong understanding of your job load.

<configuration>
  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>2</value>
  </property>

  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>1</value>
  </property>

  <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx512m</value>
  </property>

  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>

  <property>
    <name>mapred.output.compress</name>
    <value>true</value>
  </property>

  <property>
    <name>mapred.output.compression.type</name>
    <value>BLOCK</value>
  </property>

  <property>
    <name>mapred.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

  <property>
    <name>mapred.reduce.slowstart.completed.maps</name>
    <value>0.75</value>
  </property>

  <property>
    <name>mapred.reduce.tasks</name>
    <value>4</value>
  </property>

  <property>
    <name>mapred.reduce.tasks.speculative.execution</name>
    <value>false</value>
  </property>

  <property>
    <name>io.sort.mb</name>
    <value>256</value>
  </property>

  <property>
    <name>io.sort.factor</name>
    <value>64</value>
  </property>

  <property>
    <name>mapred.jobtracker.taskScheduler</name>
    <value>org.apache.hadoop.mapred.FairScheduler</value>
    <final>true</final>
  </property>

  <property>
    <name>mapred.fairscheduler.poolnameproperty</name>
    <value>mapreduce.job.user.name</value>
    <final>true</final>
  </property>

  <!--
  <property>
    <name>mapred.fairscheduler.allocation.file</name>
    <value>/hadoop/hadoop/conf/fair-scheduler.xml</value>
  </property>
  -->

  <property>
    <name>mapred.fairscheduler.assignmultiple</name>
    <value>true</value>
    <final>true</final>
  </property>

  <property>
    <name>mapred.hosts</name>
    <value>/hadoop/hadoop/conf/mapred-hosts-include</value>
  </property>

  <property>
    <name>mapred.hosts.exclude</name>
    <value>/hadoop/hadoop/conf/mapred-hosts-exclude</value>
  </property>
</configuration>

-----Original Message-----
From: Juan P. [mailto:gordoslocos@gmail.com] 
Sent: Monday, June 27, 2011 10:13 PM
To: common-user@hadoop.apache.org
Subject: Re: Performance Tunning

Ok,
So I tried putting the following config in the mapred-site.xml of all of my
nodes

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>name-node:54311</value>
  </property>
  <property>
    <name>mapred.map.tasks</name>
    <value>7</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>1</value>
  </property>
  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>7</value>
  </property>
  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>1</value>
  </property>
</configuration>

but when I start a new job it gets stuck at

11/06/28 03:04:47 INFO mapred.JobClient:  map 0% reduce 0%

Any thoughts?
Thanks for your help guys!

On Mon, Jun 27, 2011 at 7:33 PM, Juan P. <gordoslocos@gmail.com> wrote:

> Matt,
> Thanks for your help!
> I think I get it now, but this part is a bit confusing:
> *
> *
> *so: tasktracker/datanode and 6 slots left. How you break it up from there
> is your call but I would suggest either 4 mappers / 2 reducers or 5 mappers
> / 1 reducer.*
> *
> *
> If it's 2 processes per core, then it's: 4 Nodes * 4 Cores/Node * 2
> Processes/Core = 32 Processes Total
>
> So my configuration mapred-site.xml should include these props:
>
> *<property>*
> *  <name>mapred.map.tasks</name>*
> *  <value>28</value>*
> *</property>*
> *<property>*
> *  <name>mapred.reduce.tasks</name>*
> *  <value>4</value>*
> *</property>*
> *
> *
>
> Is that correct?
>
> On Mon, Jun 27, 2011 at 4:59 PM, GOEKE, MATTHEW (AG/1000) <
> matthew.goeke@monsanto.com> wrote:
>
>> If you are running default configurations then you are only getting 2
>> mappers and 1 reducer per node. The rule of thumb I have gone on (and back
>> up by the definitive guide) is 2 processes per core so: tasktracker/datanode
>> and 6 slots left. How you break it up from there is your call but I would
>> suggest either 4 mappers / 2 reducers or 5 mappers / 1 reducer.
>>
>> Check out the below configs for details on what you are *most likely*
>> running currently:
>> http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html
>> http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
>> http://hadoop.apache.org/common/docs/r0.20.2/core-default.html
>>
>> HTH,
>> Matt
>>
>> -----Original Message-----
>> From: Juan P. [mailto:gordoslocos@gmail.com]
>> Sent: Monday, June 27, 2011 2:50 PM
>> To: common-user@hadoop.apache.org
>> Subject: Performance Tunning
>>
>> I'm trying to run a MapReduce task against a cluster of 4 DataNodes with 4
>> cores each.
>> My input data is 4GB in size and it's split into 100MB files. Current
>> configuration is default so block size is 64MB.
>>
>> If I understand it correctly Hadoop should be running 64 Mappers to
>> process
>> the data.
>>
>> I'm running a simple data counting MapReduce and it's taking about 30mins
>> to
>> complete. This seems like way too much, doesn't it?
>> Is there any tunning you guys would recommend to try and see an
>> improvement
>> in performance?
>>
>> Thanks,
>> Pony
>> This e-mail message may contain privileged and/or confidential
>> information, and is intended to be received only by persons entitled
>> to receive such information. If you have received this e-mail in error,
>> please notify the sender immediately. Please delete it and
>> all attachments from any servers, hard drives or any other media. Other
>> use of this e-mail by you is strictly prohibited.
>>
>> All e-mails and attachments sent and received are subject to monitoring,
>> reading and archival by Monsanto, including its
>> subsidiaries. The recipient of this e-mail is solely responsible for
>> checking for the presence of "Viruses" or other "Malware".
>> Monsanto, along with its subsidiaries, accepts no liability for any damage
>> caused by any such code transmitted by or accompanying
>> this e-mail or any attachment.
>>
>>
>> The information contained in this email may be subject to the export
>> control laws and regulations of the United States, potentially
>> including but not limited to the Export Administration Regulations (EAR)
>> and sanctions regulations issued by the U.S. Department of
>> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
>> information you are obligated to comply with all
>> applicable U.S. export laws and regulations.
>>
>>
>

Mime
View raw message