hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Performance Tunning
Date Tue, 28 Jun 2011 18:30:31 GMT


I understood that Juan was talking about a 2 socket quad core box.  We run boxes with the
e5500 (xeon quad core ) chips. Linux sees these as 16 cores. 
Our data nodes are 32GB Ram w 4 x 2TB SATA. Its a pretty basic configuration. 

What I was saying was that if you consider 1 core for each TT, DN and RS jobs, thats 3 out
of the 8 physical cores, leaving you 5 cores or 10 'hyperthread cores'.
So you could put up 10 m/r slots on the machine.  Note that on the main tasks (TT, DN, RS)
I dedicate the physical core.

Of course your mileage may vary if you're doing non-standard or normal things.  A good starting
point is 6 mappers and 4 reducers. 
And of course YMMV depending on if you're using MapR's release, Cloudera, and if you're running
HBase or something else on the cluster.

>From our experience... we end up getting disk I/O bound first, and then network or memory
becomes the next constraint. Really the xeon chipsets are really good. 



> From: matthew.goeke@monsanto.com
> To: common-user@hadoop.apache.org
> Subject: RE: Performance Tunning
> Date: Tue, 28 Jun 2011 14:46:40 +0000
> Mike,
> I'm not really sure I have seen a community consensus around how to handle hyper-threading
within Hadoop (although I have seen quite a few articles that discuss it). I was assuming
that when Juan mentioned they were 4-core boxes that he meant 4 physical cores and not HT
cores. I was more stating that the starting point should be 1 slot per thread (or hyper-threaded
core) but obviously reviewing the results from ganglia, or any other monitoring solution,
will help you come up with a more concrete configuration based on the load.
> My brain might not be working this morning but how did you get the 10 slots again? That
seems low for an 8 physical core box but somewhat overextending for a 4 physical core box.
> Matt
> -----Original Message-----
> From: im_gumby@hotmail.com [mailto:im_gumby@hotmail.com] On Behalf Of Michel Segel
> Sent: Tuesday, June 28, 2011 7:39 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Performance Tunning
> Matt,
> You have 2 threads per core, so your Linux box thinks an 8 core box has16 cores. In my
calcs, I tend to take a whole core for TT DN and RS and then a thread per slot so you end
up w 10 slots per node. Of course memory is also a factor.
> Note this is only a starting point.you can always tune up. 
> Sent from a remote device. Please excuse any typos...
> Mike Segel
> On Jun 27, 2011, at 11:11 PM, "GOEKE, MATTHEW (AG/1000)" <matthew.goeke@monsanto.com>
> > Per node: 4 cores * 2 processes = 8 slots
> > Datanode: 1 slot
> > Tasktracker: 1 slot
> > 
> > Therefore max of 6 slots between mappers and reducers.
> > 
> > Below is part of our mapred-site.xml. The thing to keep in mind is the number of
maps is defined by the number of input splits (which is defined by your data) so you only
need to worry about setting the maximum number of concurrent processes per node. In this case
the property you want to hone in on is mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum.
Keep in mind there are a LOT of other tuning improvements that can be made but it requires
an strong understanding of your job load.
> > 
> > <configuration>
> >  <property>
> >    <name>mapred.tasktracker.map.tasks.maximum</name>
> >    <value>2</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >    <value>1</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.child.java.opts</name>
> >    <value>-Xmx512m</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.compress.map.output</name>
> >    <value>true</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.output.compress</name>
> >    <value>true</value>
> >  </property>
> > 
> > 
> This e-mail message may contain privileged and/or confidential information, and is intended
to be received only by persons entitled
> to receive such information. If you have received this e-mail in error, please notify
the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use of this e-mail
by you is strictly prohibited.
> All e-mails and attachments sent and received are subject to monitoring, reading and
archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for checking for the
presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage caused by
any such code transmitted by or accompanying
> this e-mail or any attachment.
> The information contained in this email may be subject to the export control laws and
regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR) and sanctions
regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information
you are obligated to comply with all
> applicable U.S. export laws and regulations.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message