From Michael Segel <michael_se...@hotmail.com>
Subject RE: Performance Tunning
Date Tue, 28 Jun 2011 18:30:31 GMT


I understood that Juan was talking about a 2 socket quad core box.  We run boxes with the
e5500 (xeon quad core ) chips. Linux sees these as 16 cores. 
Our data nodes are 32GB Ram w 4 x 2TB SATA. Its a pretty basic configuration. 

What I was saying was that if you consider 1 core for each TT, DN and RS jobs, thats 3 out
of the 8 physical cores, leaving you 5 cores or 10 'hyperthread cores'.
So you could put up 10 m/r slots on the machine.  Note that on the main tasks (TT, DN, RS)
I dedicate the physical core.

Of course your mileage may vary if you're doing non-standard or normal things.  A good starting
point is 6 mappers and 4 reducers. 
And of course YMMV depending on if you're using MapR's release, Cloudera, and if you're running
HBase or something else on the cluster.

>From our experience... we end up getting disk I/O bound first, and then network or memory
becomes the next constraint. Really the xeon chipsets are really good. 



> Mike,
> I'm not really sure I have seen a community consensus around how to handle hyper-threading
within Hadoop (although I have seen quite a few articles that discuss it). I was assuming
that when Juan mentioned they were 4-core boxes that he meant 4 physical cores and not HT
cores. I was more stating that the starting point should be 1 slot per thread (or hyper-threaded
core) but obviously reviewing the results from ganglia, or any other monitoring solution,
will help you come up with a more concrete configuration based on the load.
> My brain might not be working this morning but how did you get the 10 slots again? That
seems low for an 8 physical core box but somewhat overextending for a 4 physical core box.
> Matt
> Matt,
> You have 2 threads per core, so your Linux box thinks an 8 core box has16 cores. In my
calcs, I tend to take a whole core for TT DN and RS and then a thread per slot so you end
up w 10 slots per node. Of course memory is also a factor.
> Note this is only a starting point.you can always tune up. 
> > Per node: 4 cores * 2 processes = 8 slots
> > Datanode: 1 slot
> > Tasktracker: 1 slot
> > 
> > Therefore max of 6 slots between mappers and reducers.
> > 
> > Below is part of our mapred-site.xml. The thing to keep in mind is the number of
maps is defined by the number of input splits (which is defined by your data) so you only
need to worry about setting the maximum number of concurrent processes per node. In this case
the property you want to hone in on is mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum.
Keep in mind there are a LOT of other tuning improvements that can be made but it requires
an strong understanding of your job load.
> > 
> > <configuration>
> >  <property>
> >    <name>mapred.tasktracker.map.tasks.maximum</name>
> >    <value>2</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.tasktracker.reduce.tasks.maximum</name>
> >    <value>1</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.child.java.opts</name>
> >    <value>-Xmx512m</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.compress.map.output</name>
> >    <value>true</value>
> >  </property>
> > 
> >  <property>
> >    <name>mapred.output.compress</name>
> >    <value>true</value>
> >  </property>
> > 
> > 
