hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Using CapacityScheduler to divide resources between jobs (not users)
Date Sat, 06 Jul 2013 14:12:01 GMT
Hi all,

I'm running Hadoop 1.0.4 on a modest cluster (~20 machines).
The jobs running on the cluster can be divided (resource wise) as follows:

1. Very short jobs: less then 1 minute.
2. Normal jobs: 2-3 minutes up to an hour or two.
3. Very long jobs: days of processing. (still not active and the reason for
my inquiries here).

I was thinking of using the CapacityScheduler and divide the cluster
resources so that the long jobs can run without disturbing the other jobs.
I read that such job queues should be upper bound as well since it may use
the entire cluster resources once it's free but since it takes a long time
to finish, it won't release them to other queues as it should. Is it so ?
Any advise about using the CapacityScheduler in that use case ?

Thanks, and sorry for re-sending this message.

Amit.

Mime
View raw message