hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: Handling Non Homogenous tasks via Hadoop
Date Tue, 07 Apr 2009 18:30:48 GMT

The mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum properties can be controlled on a
per-host basis in their hadoop-site.xml files. With this you can configure
nodes with more/fewer cores/RAM/etc to take on varying amounts of work.

There's no current mechanism to provide feedback to the task scheduler,
though, based on actual machine utilization in real time.

- Aaron

On Tue, Apr 7, 2009 at 7:54 AM, amit handa <amhanda@gmail.com> wrote:

> Hi,
> Is there a way I can control number of tasks that can be spawned on a
> machine based on the machine capacity and how loaded the machine already is
> ?
> My use case is as following:
> I have to perform task 1,task2,task3 ...task n .
> These tasks have varied CPU and memory usage patterns.
> All tasks of type task 1,task3 can take 80-90%CPU and 800 MB of RAM.
> All type of tasks task2 take only 1-2% of CPU and 5-10 MB of RAM
> How do i model this using Hadoop ? Can i use only one cluster for running
> all these type of tasks ?
> Shall I use different hadoop clusters for each tasktype , if yes, then how
> do i share data between these tasks (the data can be few MB to few GB)
> Please suggest or point to any docs which i can dig up.
> Thanks,
> Amit

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message