hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iván de Prado (JIRA) <j...@apache.org>
Subject [jira] Commented: (HADOOP-3420) Recover the deprecated mapred.tasktracker.tasks.maximum
Date Wed, 21 May 2008 10:11:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598614#action_12598614

Iván de Prado commented on HADOOP-3420:

I understand. So the solution is not so easy. The problem I see with the current configuration
schema arises for clusters that usually execute jobs in sequence, but jobs in parallel are
executed some times. Let's suppose you have nodes with N CPUs and you can execute at most
N tasks per node with the available memory. You have to configure N/2 max maps and N/2 max
reduces per node if you want to be able to execute some jobs in parallel. But the cluster
will take advantage of only half of the resources when executing sequential jobs.

Is it possible to have a configuration schema that allows to use all resources for sequential
jobs but not more than available resources when parallel job executions?

Does it make sense to have a  mapred.tasktracker.tasks.maximum that limits the maximun total
number of tasks per node, but forcing  mapred.tasktracker.reduce.tasks.maximum to be smaller
than  mapred.tasktracker.tasks.maximum for skip the possible deadlock?

Thanks for your amazing OS project. 

> Recover the deprecated mapred.tasktracker.tasks.maximum
> -------------------------------------------------------
>                 Key: HADOOP-3420
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3420
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4
>            Reporter: Iván de Prado
> https://issues.apache.org/jira/browse/HADOOP-1274 replaced the configuration attribute
mapred.tasktracker.tasks.maximum with mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum
because it sometimes make sense to have more mappers than reducers assigned to each node.
> But deprecating mapred.tasktracker.tasks.maximum could be an issue in some situations.
For example, when more than one job is running, reduce tasks + map tasks eat too many resources.
For avoid this cases an upper limit of tasks is needed. So I propose to have the configuration
parameter mapred.tasktracker.tasks.maximum as a total limit of task. It is compatible with
mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum.
> As an example:
> I have a 8 cores, 4GB, 4 nodes cluster. I want to limit the number of tasks per node
to 8. 8 tasks per nodes would use almost 100% cpu and 4 GB of the memory. I have set:
>   mapred.tasktracker.map.tasks.maximum -> 8
>   mapred.tasktracker.reduce.tasks.maximum -> 8 
> 1) When running only one Job at the same time, it works smoothly: 8 task average per
node, no swapping in nodes, almost 4 GB of memory usage and 100% of CPU usage. 
> 2) When running more than one Job at the same time, it works really bad: 16 tasks average
per node, 8 GB usage of memory (4 GB swapped), and a lot of System CPU usage.
> So, I think that have sense to restore the old attribute mapred.tasktracker.tasks.maximum
making it compatible with the new ones.
> Task trackers could not:
>  - run more than mapred.tasktracker.tasks.maximum tasks per node,
>  - run more than mapred.tasktracker.map.tasks.maximum mappers per node, 
>  - run more than mapred.tasktracker.reduce.tasks.maximum reducers per node. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message