hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep L <sandeepvre...@outlook.com>
Subject RE: Using CapacityScheduler to divide resources between jobs (not users)
Date Tue, 09 Jul 2013 05:00:30 GMT
One solution I can suggest is use multiple jobtrackers.Jobtracker1: 2 or 3 machines as tasktrackersJobtracker2:
around 7 machines as tasktrackersJobtracker3: around 10 machines as tasktrackers
As per requirement you can change number of tasktracker machines and run jobs accordingly.

Date: Sat, 6 Jul 2013 17:12:01 +0300
Subject: Using CapacityScheduler to divide resources between jobs (not users)
From: amits@infolinks.com
To: user@hadoop.apache.org

Hi all, 
I'm running Hadoop 1.0.4 on a modest cluster (~20 machines).The jobs running on the cluster
can be divided (resource wise) as follows:

1. Very short jobs: less then 1 minute.2. Normal jobs: 2-3 minutes up to an hour or two.3.
Very long jobs: days of processing. (still not active and the reason for my inquiries here).

I was thinking of using the CapacityScheduler and divide the cluster resources so that the
long jobs can run without disturbing the other jobs.I read that such job queues should be
upper bound as well since it may use the entire cluster resources once it's free but since
it takes a long time to finish, it won't release them to other queues as it should. Is it
so ?
Any advise about using the CapacityScheduler in that use case ?
Thanks, and sorry for re-sending this message.
View raw message