hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject Re: Quincy Fair Scheduler Vs Hadoop Fair Scheduler
Date Tue, 21 Jun 2011 19:23:01 GMT

On Jun 17, 2011, at 7:16 PM, Venu Gopala Rao wrote:
> Does these problems get solved in the 0.21 or Next Gen Map Reduce?

The CapacityScheduler in both 0.20.203 and NextGen MR have  
significantly better scheduling for locality including a highly  
modified 'delay scheduling' algorithm Matei mentioned in his reply.

The highlights are (via MAPREDUCE-517 for 0.20.203 - http://tinyurl.com/6e9qlun 

# The CS tracks 'number of scheduling opportunities' missed by a job  
and gets jobs to use that to prevent starvation. AFAIK the  
FairScheduler uses 'time' rather than 'scheduling opportunities'.
# I've also added 'pace' to the back-off by getting ensuring jobs do  
not back-off as aggressively as they make progress. At the end of a  
job, i.e. last 10 tasks of a job with 100k tasks,  latency is more  
important than perfect data-locality.
# The implementation also gets small jobs to 'delay' less vis-a-vis  
larger jobs by weighing the size of job.
# The implementation also ensures jobs with no locality requirements,  
e.g. sleep-job/randomwriter, do not have any 'delay'  since it doesn't  
make sense at all.

Since we deployed the new CS (via 0.20.203) we've seen >100%  
improvement for locality of jobs in our production clusters.

The NextGen MR CapacityScheduler has a very similar implementation to  
the one described above.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message