hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahajan, Neeraj" <nemaha...@ebay.com>
Subject RE: Query about number of task trackers specific to a site
Date Fri, 17 Aug 2007 19:31:02 GMT
Hi Michael,

Thanks for the prompt reply. I was going thorugh your bug description,
but it (the second statement) didn't completely make sense to me.
>	When I start a job, hadoop uses mapred.tasktracker.tasks.maximum
on the jobtracker. Once these tasks finish, it is the tasktracker's
value of 
>	mapred.tasktracker.tasks.maximum that decides how many new tasks
are created for each host.

Could you please explain it.  


-----Original Message-----
From: Michael Bieniosek [mailto:michael@powerset.com] 
Sent: Friday, August 17, 2007 11:55 AM
To: hadoop-user@lucene.apache.org; Mahajan, Neeraj
Subject: Re: Query about number of task trackers specific to a site


This bug makes it difficult to run hadoop on heterogeneous clusters
efficiently.  Aside from fixing the bug, your best options are probably:
1) split your large heterogeneous cluster into smaller homogeneous
2) run with lots of small tasks so the tasktracker's value for
maxCurrentTasks replaces the jobtracker's bad value more quickly.


On 8/17/07 11:47 AM, "Mahajan, Neeraj" <nemahajan@ebay.com> wrote:

> Hi,
> In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master 
> with the Job tracker.
> Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the 
> corresponding property (mapred.tasktracker.tasks.maximum) in 
> hadoop-site.xml for each of the machines.
> I observed that when all the task trackers start, maxCurrentTasks is 
> loaded correctly. But when I execute a job, I can see that 4 
> TaskTracker$Child execute on each of the machine. Any idea what am I 
> missing or is this a known bug?
> Regards,
> Neeraj

View raw message