hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Mapred job parallelism
Date Mon, 26 Jan 2009 21:43:07 GMT
I believe that the schedule code in 0.19.0 has a framework for this, but I
haven't dug into it in detail yet.

http://hadoop.apache.org/core/docs/r0.19.0/capacity_scheduler.html

>From what I gather you would set up 2 queues, each with guaranteed access to
1/2 of the cluster
Then you submit your jobs to alternate queues.

This is not ideal as you have to balance what queue you submit jobs to, to
ensure that there is some depth.


On Mon, Jan 26, 2009 at 1:30 PM, Sagar Naik <snaik@attributor.com> wrote:

> Hi Guys,
>
> I was trying to setup a cluster so that two jobs can run simultaneously.
>
> The conf :
> number of nodes : 4(say)
> mapred.tasktracker.map.tasks.maximum=2
>
>
> and in the joblClient
> mapred.map.tasks=4 (# of nodes)
>
>
> I also have a condition, that each job should have only one map-task per
> node
>
> In short, created 8 map slots and set the number of mappers to 4.
> So now, we have two jobs running simultaneously
>
> However, I realized that, if a tasktracker happens to die, potentially, I
> will have 2 map-tasks running on a node
>
>
> Setting mapred.tasktracker.map.tasks.maximum=1 in Jobclient has no effect.
> It is tasktracker property and cant be changed per job
>
> Any ideas on how to have 2 jobs running simultaneously ?
>
>
> -Sagar
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message