hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Why not having mapred.tasktracker.tasks.maximum?
Date Fri, 11 Jun 2010 15:45:55 GMT
On Fri, Jun 11, 2010 at 8:35 AM, S├ębastien Rainville <
sebastienrainville@gmail.com> wrote:

> Hi,
>
> I'm playing around with the hadoop config to optimize the resources of our
> cluster. I'm noticing that the cpu usage is sub-optimal. All the machines
> in
> the cluster have 1 quad core cpu. I looked at our
> mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum settings and the max map tasks
> is set to 2 and the max reduce tasks is set to 1, keeping 1 cpu for running
> the database (Cassandra) and the OS.
>
> My question is: why separating the settings for the map tasks and reduce
> tasks? I feel like what I want is to set
> mapred.tasktracker.tasks.maximum=3,
> so that all the cpus are always available for both map and reduce tasks.
>
> Am I missing something?
>
> Thanks,
> Sebastien
>

That suggestion makes sense. As you run more concurrent jobs you may find
that having dedicated slots for reduce tasks is useful. You would not want a
cluster running 600 mappers and 0 reducers :)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message