hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Stein <charmal...@allthingshadoop.com>
Subject Re: what affects number of reducers launched by hadoop?
Date Thu, 29 Jul 2010 12:54:46 GMT
there is no setting but the max tasks would be how many you set for map &
reduce tasks per node (so if you set 7 for map and 6 for reduce then you
will not have more than 13 tasks running on the node as a result of the 2
settings).

http://hadoop.apache.org/common/docs/current/cluster_setup.html

You can also set the max num tasks for your JVM so that it will reuse JVM
for crunching
http://books.google.com/books?id=bKPEwR-Pt6EC&pg=PA170&lpg=PA170&dq=tom+white+hadoop+jvm&source=bl&ots=kOew2vedyn&sig=oHDtBJQYRbqN06y7ulq7crdvTRs&hl=en&ei=_3hRTJ7UMZTe4AaaoazrAw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBIQ6AEwAA#v=onepage&q&f=false

you need to kind of balance RAM & CPU with everything you are doing with
setting these and try to get the most from your config to bang on the box.
Tom White's book has a good reference on this (and everything else) too.

here are a couple tips & tricks you might find helpful in your first cluster
http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/

On Thu, Jul 29, 2010 at 6:31 AM, Abhinay Mehta <abhinay.mehta@gmail.com>wrote:

> Which configuration key controls "the number of maximum tasks per node" ?
>
>
> On 28 July 2010 20:40, Joe Stein <charmalloc@allthingshadoop.com> wrote:
>
> > mapred.tasktracker.reduce.tasks.maximum is how many you want as a ceiling
> > per node
> >
> > you need to configure *mapred.reduce.tasks* to be more than one as it is
> > defaulted to 1 (which you are overriding in your code which is why it
> works
> > there)
> >
> > This value should be somewhere between .95 and 1.75 times the number of
> > maximum tasks per node times the number of data nodes.
> >
> > So if you have 3 data nodes and it is setup max tasks of 7 then configure
> > this between 25 and 36
> >
> > On Wed, Jul 28, 2010 at 3:24 PM, Vitaliy Semochkin <vitaliy.se@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > in my cluster mapred.tasktracker.reduce.tasks.maximum = 4
> > > however during monitoring the job in job tracker I see only 1 reducer
> > > working
> > >
> > > first it is
> > > reduce > copy - can someone please explain what does this mean?
> > >
> > > after it is
> > > reduce > reduce
> > >
> > > when I set the number of reduce tasks for a job programatically to 10
> > > job.setNumReduceTasks(10);
> > > the number of "reduce > reduce" reducers increases to 10 and the
> > > performance of application increases as well (the number of reducers
> > > never exceeds).
> > >
> > > Can someone explain such behavior?
> > >
> > > Thanks in Advance,
> > > Vitaliy S
> > >
> >
> >
> >
> > --
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop
> > */
> >
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop
*/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message