hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Safdar Kureishy <safdar.kurei...@gmail.com>
Subject Re: Restricting the number of slave nodes used for a given job (regardless of the # of map/reduce tasks involved)
Date Mon, 10 Sep 2012 21:32:28 GMT
Thanks Bertrand/Hemanth, for your prompt replies! This helps :)

Regards,
Safdar


On Mon, Sep 10, 2012 at 2:18 PM, Bertrand Dechoux <dechouxb@gmail.com>wrote:

> If that is only for benchmarking, you could stop the task-trackers on the
> machines you don't want to use.
> Or you could setup another cluster.
>
> But yes, there is not standard way to limit the slots taken by a job to a
> specified set of machines.
> You might be able to do it using a custom Scheduler but that would be out
> of your scope, I guess.
>
> Regards
>
> Bertrand
>
> On Mon, Sep 10, 2012 at 12:01 PM, Hemanth Yamijala <yhemanth@gmail.com
> >wrote:
>
> > Hi,
> >
> > I am not sure if there's any way to restrict the tasks to specific
> > machines. However, I think there are some ways of restricting to
> > number of 'slots' that can be used by the job.
> >
> > Also, not sure which version of Hadoop you are on. The
> > capacityscheduler
> > (
> >
> http://hadoop.apache.org/common/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> > )
> > has ways by which you can set up a queue with a hard capacity limit.
> > The capacity controls the number of slots that that can be used by
> > jobs submitted to the queue. So, if you submit a job to the queue,
> > irrespective of the number of tasks it has, it should limit it to
> > those slots.  However, please note that this does not restrict the
> > tasks to specific machines.
> >
> > Thanks
> > Hemanth
> >
> > On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy
> > <safdar.kureishy@gmail.com> wrote:
> > > Hi,
> > >
> > > I need to run some benchmarking tests for a given mapreduce job on a
> > *subset
> > > *of a 10-node Hadoop cluster. Not that it matters, but the current
> > cluster
> > > settings allow for ~20 map slots and 10 reduce slots per node.
> > >
> > > Without loss of generalization, let's say I want a job with these
> > > constraints below:
> > > - to use only *5* out of the 10 nodes for running the mappers,
> > > - to use only *5* out of the 10 nodes for running the reducers.
> > >
> > > Is there any other way of achieving this through Hadoop property
> > overrides
> > > during job-submission time? I understand that the Fair Scheduler can
> > > potentially be used to create pools of a proportionate # of mappers and
> > > reducers, to achieve a similar outcome, but the problem is that I still
> > > cannot tie such a pool to a fixed # of machines (right?). Essentially,
> > > regardless of the # of map/reduce tasks involved, I only want a *fixed
> #
> > of
> > > machines* to handle the job.
> > >
> > > Any tips on how I can go about achieving this?
> > >
> > > Thanks,
> > > Safdar
> >
>
>
>
> --
> Bertrand Dechoux
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message