hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: how to set max map tasks individually for each job?
Date Sat, 05 Jun 2010 04:13:37 GMT
On Thu, Jun 3, 2010 at 4:45 AM, Alex Munteanu <alex@geraeteturnen.com> wrote:
> Hello,
> I am running several different mapreduce jobs. For some of them it is
> better to have a rather high number of running map tasks per node,
> whereas others do very intensive read operations on our database
> resulting in read timeouts. So for these jobs I'd like to set a much
> smaller limit of concurrently running map tasks.
> I tried to overwrite the "mapred.tasktracker.map.tasks.maximum" value in
> our job setup but it seems to be a global setting since it affects the
> tasktrackers, not the scheduling component.

That's correct.

> Also i've found https://issues.apache.org/jira/browse/HADOOP-5170 on the
> web. It seems to be exactly what I need but the changes seem not to be
> in the current 0.20.2 release which I am using and they also seem to
> involve the JobConf class which for now is marked deprecated.

There are two parts here. Regarding HADOOP-5170, you can see that it
was strongly debated in the JIRA comments. This patch was backed out
of 0.21 (the version it was scheduled to part of) and the author opted
to submit it as part of the Fair Scheduler rather than Hadoop MR. I'm
not sure of the exact status as to its inclusion in the fair scheduler
code base.

While the JobConf (and many related mapred.* classes) are marked
@Deprecated the reality is that they will probably be un-deprecated
for the next release. They'll be around for a while.

> So I have no idea how to do this without changing the global tasktracker map
> task maximum value and
> restarting the system.

Unfortunately, there's no good way to handle this right now. You can
use the fair scheduler to create two pools with varying max tasks, but
that's cluster wide, not per host so I don't think that will be
helpful. A better option is to pack more work into each task in the
"lighter" of your two jobs so they have similar performance
characteristics, if possible. Of course, easier said than done, I

Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

View raw message