hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kostyrka <andr...@kostyrka.org>
Subject Re: How can I control Number of Mappers of a job?
Date Thu, 31 Jul 2008 23:55:05 GMT
Well, the only way to reliably fix the number of maptasks that I've found is 
by using compressed input files, that forces hadoop to assign one and only 
one file to a map task ;)


On Thursday 31 July 2008 21:30:33 Gopal Gandhi wrote:
> Thank you, finally someone has interests in my questions =)
> My cluster contains more than one machine. Please don't get me wrong :-). I
> don't want to limit the total mappers in one node (by mapred.map.tasks).
> What I want is to limit the total mappers for one job. The motivation is
> that I have 2 jobs to run at the same time. they have "the same input data
> in Hadoop". I found that one job has to wait until the other finishes its
> mapping. Because the 2 jobs are submitted by 2 different people, I don't
> want one job to be starving. So I want to limit the first job's total
> mappers so that the 2 jobs will be launched simultaneously.
> ----- Original Message ----
> From: "Goel, Ankur" <ankur.goel@corp.aol.com>
> To: core-user@hadoop.apache.org
> Cc: core-dev@hadoop.apache.org
> Sent: Wednesday, July 30, 2008 10:17:53 PM
> Subject: RE: How can I control Number of Mappers of a job?
> How big is your cluster? Assuming you are running a single node cluster,
> Hadoop-default.xml has a parameter 'mapred.map.tasks' that is set to 2.
> So
> By default, no matter how many map tasks are calculated by framework,
> only  2 map task will execute on a single node cluster.
> -----Original Message-----
> From: Gopal Gandhi [mailto:gopal.gandhi2008@yahoo.com]
> Sent: Thursday, July 31, 2008 4:38 AM
> To: core-user@hadoop.apache.org
> Cc: core-dev@hadoop.apache.org
> Subject: How can I control Number of Mappers of a job?
> The motivation is to control the max # of mappers of a job. For example,
> the input data is 246MB, divided by 64M is 4. If by default there will
> be 4 mappers launched on the 4 blocks.
> What I want is to set its max # of mappers as 2, so that 2 mappers are
> launched first and when they completes on the first 2 blocks, another 2
> mappers start on the rest 2 blocks. Does Hadoop provide a way?

View raw message