hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Moores <mmoo...@real.com>
Subject Re: Limiting concurrent maps
Date Fri, 22 Oct 2010 00:30:40 GMT
I don't see how the capacity scheduler could limit the number of maps running concurrently
across  the whole cluster,
even if this is the only job running.

but, maybe with the fair scheduler mapred.fairscheduler.loadmanager extension point:

mapred.fairscheduler.loadmanager        An extensibility point that lets you specify a class
that determines how many maps and reduces can run on a given TaskTracker. This class should
implement the LoadManager interface. By default the task caps in the Hadoop config file are
used, but this option could be used to make the load based on available memory and CPU utilization
for example.

On Oct 20, 2010, at 4:32 PM, james warren wrote:

Hi Michael,

Any of the tasktracker configs affect the local tasktracker daemon and not
other servers in your cluster.  Moreover, they can't be overridden by a job
configuration.  Sounds like you're in need of a job scheduler; I personally
prefer use the Fair Scheduler but I'm sure the Capacity Scheduler would suit
your needs as well.


On Wed, Oct 20, 2010 at 3:41 PM, Michael Moores <mmoores@real.com<mailto:mmoores@real.com>>

I have been playing with mapreduce.tasktracker.map.tasks.maximum to reduce
the load
on my Cassandra cluster (using the Cassandra ColumnFamilyInputFormat).  I'd
like to find ways of throttling the map operations
in the case I may be affecting OLTP activity on the cluster.

What parameters can I use to limit the number of map tasks running
concurrently across the whole cluster?
limits the number of concurrent maps per task tracker.  But can i do this
at the job level?

Should I look at the "fair" scheduler?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message