hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From amit handa <amha...@gmail.com>
Subject Handling Non Homogenous tasks via Hadoop
Date Tue, 07 Apr 2009 14:54:18 GMT

Is there a way I can control number of tasks that can be spawned on a
machine based on the machine capacity and how loaded the machine already is

My use case is as following:

I have to perform task 1,task2,task3 ...task n .
These tasks have varied CPU and memory usage patterns.
All tasks of type task 1,task3 can take 80-90%CPU and 800 MB of RAM.
All type of tasks task2 take only 1-2% of CPU and 5-10 MB of RAM

How do i model this using Hadoop ? Can i use only one cluster for running
all these type of tasks ?
Shall I use different hadoop clusters for each tasktype , if yes, then how
do i share data between these tasks (the data can be few MB to few GB)

Please suggest or point to any docs which i can dig up.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message