hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject Re: Changing the maximum tasks per node on a per job basis
Date Fri, 24 May 2013 08:17:57 GMT
My reading on Capacity Scheduling is that it controls the number of jobs
scheduled at the level of the cluster.
My issue is not sharing at the level of the cluster - usually my job is the
only one running but rather at the level of
the individual machine.
  Some of my jobs require more memory and do significant processing -
especially in the reducer - While the cluster can schedule 8 smaller jobs
on a node when, say, 8  of the larger ones are scheduled slaves run out of
swap space and tend to crash.
  It is not unclear that limiting the number of jobs on the cluster will
stop a scheduler from scheduling the maximum allowed jobs on any node.
  Even requesting multiple slots for a job affects the number of jobs
running on the cluster but not on any specific node.
  Am I wrong here? If I want, say only three of my jobs running on one node
does asking for enough slots to guarantee the total jobs is no more than 3
times the number of nodes guarantee this?
   My read is that the total running jobs might be throttled but not the
number per node.
  Perhaps a clever use of queues might help but I am not quite sure about
the details

On Thu, May 23, 2013 at 4:37 PM, Harsh J <harsh@cloudera.com> wrote:

> Your problem seems to surround available memory and over-subscription. If
> you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to
> use the CapacityScheduler to address this for you.
> I once detailed how-to, on a similar question here:
> http://search-hadoop.com/m/gnFs91yIg1e
> On Wed, May 22, 2013 at 2:55 PM, Steve Lewis <lordjoe2000@gmail.com>
> wrote:
> > I have a series of Hadoop jobs to run - one of my jobs requires larger
> than
> > standard memory
> > I allow the task to use 2GB of memory. When I run some of these jobs the
> > slave nodes are crashing because they run out of swap space. It is not
> that
> > s slave count not run one. or even 4  of these jobs but 8 stresses the
> > limits.
> >  I could cut the mapred.tasktracker.reduce.tasks.maximum for the entire
> > cluster but this cripples the whole cluster for one of many jobs.
> > It seems to be a very bad design
> > a) to allow the job tracker to keep assigning tasks to a slave that is
> > already getting low on memory
> > b) to allow the user to run jobs capable or crashing noeds on the cluster
> > c) not to allow the user to specify that some jobs need to be limited to
> a
> > lower value without requiring this limit for every job.
> >
> > Are there plans to fix this??
> >
> > --
> >
> --
> Harsh J

Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message