hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Bieniosek <mich...@powerset.com>
Subject Re: Query about number of task trackers specific to a site
Date Fri, 17 Aug 2007 20:23:09 GMT
Well, it's been a while since I filed that bug, so it's possible that things
have changed, or that I don't remember the circumstances correctly.

Sorry!

-Michael 

On 8/17/07 1:15 PM, "Mahajan, Neeraj" <nemahajan@ebay.com> wrote:

> Hmm ..
> I am not observing the second behavior. I ran a job of more than 500
> tasks. Each task tracker executed many tasks, but at all times I could
> see that 4 child processes were running on each machine.
> 
> ~ Neeraj 
> 
> -----Original Message-----
> From: Michael Bieniosek [mailto:michael@powerset.com]
> Sent: Friday, August 17, 2007 1:01 PM
> To: Mahajan, Neeraj; hadoop-user@lucene.apache.org
> Subject: Re: Query about number of task trackers specific to a site
> 
> I updated the description on the jira ticket.
> 
> You can imagine that the cluster could potentially operate in two modes:
> 1) configure the value for number of parallel tasks once on the
> jobtracker, so each tasktracker gets the same number of parallel tasks.
> This assumes that all the machines in the cluster have comparable
> hardware.
> 2) configure the value for number of parallel tasks for each
> tasktracker, so each tasktracker could potentially get a different
> number of parallel tasks.
> This is what you want for your situation.
> 
> When a new hadoop job starts up, the cluster operates in mode 1).  After
> one task finishes on the tasktracker, that tasktracker seems to move
> into mode 2).  
> 
> -Michael
> 
> On 8/17/07 12:31 PM, "Mahajan, Neeraj" <nemahajan@ebay.com> wrote:
> 
>> Hi Michael,
>> 
>> Thanks for the prompt reply. I was going thorugh your bug description,
> 
>> but it (the second statement) didn't completely make sense to me.
>>> When I start a job, hadoop uses mapred.tasktracker.tasks.maximum
>> on the jobtracker. Once these tasks finish, it is the tasktracker's
>> value of
>>> mapred.tasktracker.tasks.maximum that decides how many new tasks
>> are created for each host.
>> 
>> Could you please explain it.
>> 
>> Thanks,
>> Neeraj
>> 
>> -----Original Message-----
>> From: Michael Bieniosek [mailto:michael@powerset.com]
>> Sent: Friday, August 17, 2007 11:55 AM
>> To: hadoop-user@lucene.apache.org; Mahajan, Neeraj
>> Subject: Re: Query about number of task trackers specific to a site
>> 
>> https://issues.apache.org/jira/browse/HADOOP-1245
>> 
>> This bug makes it difficult to run hadoop on heterogeneous clusters
>> efficiently.  Aside from fixing the bug, your best options are
> probably:
>> 1) split your large heterogeneous cluster into smaller homogeneous
>> clusters
>> 2) run with lots of small tasks so the tasktracker's value for
>> maxCurrentTasks replaces the jobtracker's bad value more quickly.
>> 
>> -Michael
>> 
>> On 8/17/07 11:47 AM, "Mahajan, Neeraj" <nemahajan@ebay.com> wrote:
>> 
>>> Hi,
>>>  
>>> In my hadoop setup, say I have 4 machines (M1 - M4). M1 is the master
> 
>>> with the Job tracker.
>>> Say I want 4 parallel tasks on M1, 2 on M2/M3 and 6 on M4. I set the
>>> corresponding property (mapred.tasktracker.tasks.maximum) in
>>> hadoop-site.xml for each of the machines.
>>> I observed that when all the task trackers start, maxCurrentTasks is
>>> loaded correctly. But when I execute a job, I can see that 4
>>> TaskTracker$Child execute on each of the machine. Any idea what am I
>>> missing or is this a known bug?
>>> 
>>> Regards,
>>> Neeraj


Mime
View raw message