hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Quincy Fair Scheduler Vs Hadoop Fair Scheduler
Date Tue, 21 Jun 2011 20:02:59 GMT
Arun,

Small question: Is MR-517 included in the 203 release? Asking cause it
may appear so from your comments and the JIRA may need a resolved in:
field for the same :)

On Wed, Jun 22, 2011 at 12:53 AM, Arun C Murthy <acm@yahoo-inc.com> wrote:
> Venu,
>
>
> On Jun 17, 2011, at 7:16 PM, Venu Gopala Rao wrote:
>>
>> Does these problems get solved in the 0.21 or Next Gen Map Reduce?
>>
>
> The CapacityScheduler in both 0.20.203 and NextGen MR have significantly
> better scheduling for locality including a highly modified 'delay
> scheduling' algorithm Matei mentioned in his reply.
>
> The highlights are (via MAPREDUCE-517 for 0.20.203 -
> http://tinyurl.com/6e9qlun )
>
> # The CS tracks 'number of scheduling opportunities' missed by a job and
> gets jobs to use that to prevent starvation. AFAIK the FairScheduler uses
> 'time' rather than 'scheduling opportunities'.
> # I've also added 'pace' to the back-off by getting ensuring jobs do not
> back-off as aggressively as they make progress. At the end of a job, i.e.
> last 10 tasks of a job with 100k tasks,  latency is more important than
> perfect data-locality.
> # The implementation also gets small jobs to 'delay' less vis-a-vis larger
> jobs by weighing the size of job.
> # The implementation also ensures jobs with no locality requirements, e.g.
> sleep-job/randomwriter, do not have any 'delay'  since it doesn't make sense
> at all.
>
> Since we deployed the new CS (via 0.20.203) we've seen >100% improvement for
> locality of jobs in our production clusters.
>
> The NextGen MR CapacityScheduler has a very similar implementation to the
> one described above.
>
> hth,
> Arun



-- 
Harsh J

Mime
View raw message