I have been playing with mapreduce.tasktracker.map.tasks.maximum to reduce the load
on my Cassandra cluster (using the Cassandra ColumnFamilyInputFormat). I'd like to find ways
of throttling the map operations
in the case I may be affecting OLTP activity on the cluster.
What parameters can I use to limit the number of map tasks running concurrently across the
whole cluster? mapreduce.tasktracker.map.tasks.maximum
limits the number of concurrent maps per task tracker. But can i do this at the job level?
Should I look at the "fair" scheduler?
regards,Michael
|