Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <510100653.1211302195785.JavaMail.jira@brutus>
Date: Tue, 20 May 2008 09:49:55 -0700 (PDT)
From: "Doug Cutting (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-3420) Recover the deprecated
 mapred.tasktracker.tasks.maximum
In-Reply-To: <1864667794.1211285935541.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HADOOP-3420?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D125=
98355#action_12598355 ]=20

Doug Cutting commented on HADOOP-3420:
--------------------------------------

Note this differs from the former semantics of mapred.tasktracker.tasks.max=
imum.  Before it was both the total number of map tasks and the total numbe=
r of reduce tasks, for example, if it was 4, then there could be up to 4 ma=
p tasks and up to 4 reduce tasks, for a total of up to 8 tasks per node.

Also note that, under your proposal, a configuration where mapred.tasktrack=
er.tasks.maximum is not greater than mapred.tasktracker.reduce.tasks.maximu=
m can lead to deadlock.  If every slot is filled performing a reduce, and a=
 node fails, triggering re-execution of its maps, but no map slots are avai=
lable, then, currently, the system will not kill a reduce task, but rather =
all the reduce tasks will patiently wait forever.


> Recover the deprecated mapred.tasktracker.tasks.maximum
> -------------------------------------------------------
>
>                 Key: HADOOP-3420
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3420
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4
>            Reporter: Iv=C3=A1n de Prado
>
> https://issues.apache.org/jira/browse/HADOOP-1274 replaced the configurat=
ion attribute mapred.tasktracker.tasks.maximum with mapred.tasktracker.map.=
tasks.maximum and mapred.tasktracker.reduce.tasks.maximum because it someti=
mes make sense to have more mappers than reducers assigned to each node.
> But deprecating mapred.tasktracker.tasks.maximum could be an issue in som=
e situations. For example, when more than one job is running, reduce tasks =
+ map tasks eat too many resources. For avoid this cases an upper limit of =
tasks is needed. So I propose to have the configuration parameter mapred.ta=
sktracker.tasks.maximum as a total limit of task. It is compatible with map=
red.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maxim=
um.
> As an example:
> I have a 8 cores, 4GB, 4 nodes cluster. I want to limit the number of tas=
ks per node to 8. 8 tasks per nodes would use almost 100% cpu and 4 GB of t=
he memory. I have set:
> =EF=BB=BF  mapred.tasktracker.map.tasks.maximum -> 8
>   =EF=BB=BFmapred.tasktracker.reduce.tasks.maximum -> 8=20
> 1) When running only one Job at the same time, it works smoothly: 8 task =
average per node, no swapping in nodes, almost 4 GB of memory usage and 100=
% of CPU usage.=20
> 2) When running more than one Job at the same time, it works really bad: =
16 tasks average per node, 8 GB usage of memory (4 GB swapped), and a lot o=
f System CPU usage.
> So, I think that have sense to restore the old attribute =EF=BB=BFmapred.=
tasktracker.tasks.maximum making it compatible with the new ones.
> Task trackers could not:
>  - run more than mapred.tasktracker.tasks.maximum tasks per node,
>  - run more than =EF=BB=BFmapred.tasktracker.map.tasks.maximum mappers pe=
r node,=20
>  - run more than =EF=BB=BFmapred.tasktracker.reduce.tasks.maximum reducer=
s per node.=20

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.