hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Bieniosek <mich...@powerset.com>
Subject Re: Query about number of task trackers specific to a site
Date Fri, 17 Aug 2007 20:00:34 GMT
I updated the description on the jira ticket.

You can imagine that the cluster could potentially operate in two modes:
1) configure the value for number of parallel tasks once on the jobtracker,
so each tasktracker gets the same number of parallel tasks.  This assumes
that all the machines in the cluster have comparable hardware.
2) configure the value for number of parallel tasks for each tasktracker, so
each tasktracker could potentially get a different number of parallel tasks.
This is what you want for your situation.

When a new hadoop job starts up, the cluster operates in mode 1).  After one
task finishes on the tasktracker, that tasktracker seems to move into mode
2).  

-Michael

On 8/17/07 12:31 PM, "Mahajan, Neeraj" <nemahajan@ebay.com> wrote:

> Hi Michael,
> 
> Thanks for the prompt reply. I was going thorugh your bug description,
> but it (the second statement) didn't completely make sense to me.
>> When I start a job, hadoop uses mapred.tasktracker.tasks.maximum
> on the jobtracker. Once these tasks finish, it is the tasktracker's
> value of 
>> mapred.tasktracker.tasks.maximum that decides how many new tasks
> are created for each host.
> 
> Could you please explain it.
> 
> Thanks,
> Neeraj
> 
> -----Original Message-----
> From: Michael Bieniosek [mailto:michael@powerset.com]
> Sent: Friday, August 17, 2007 11:55 AM
> To: hadoop-user@lucene.apache.org; Mahajan, Neeraj
> Subject: Re: Query about number of task trackers specific to a site
> 
> https://issues.apache.org/jira/browse/HADOOP-1245
> 
> This bug makes it difficult to run hadoop on heterogeneous clusters
> efficiently.  Aside from fixing the bug, your best options are probably:
> 1) split your large heterogeneous cluster into smaller homogeneous
> clusters
> 2) run with lots of small tasks so the tasktracker's value for
> maxCurrentTasks replaces the jobtracker's bad value more quickly.
> 
> -Michael 
> 
> On 8/17/07 11:47 AM, "Mahajan, Neeraj" <nemahajan@ebay.com> wrote:
> 
>> Hi,
>>  
>> In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master
>> with the Job tracker.
>> Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the
>> corresponding property (mapred.tasktracker.tasks.maximum) in
>> hadoop-site.xml for each of the machines.
>> I observed that when all the task trackers start, maxCurrentTasks is
>> loaded correctly. But when I execute a job, I can see that 4
>> TaskTracker$Child execute on each of the machine. Any idea what am I
>> missing or is this a known bug?
>> 
>> Regards,
>> Neeraj


Mime
View raw message