hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "BristolHadoopWorkshopSpring2010" by SteveLoughran
Date Thu, 01 Apr 2010 18:33:13 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "BristolHadoopWorkshopSpring2010" page has been changed by SteveLoughran.
The comment on this change is: dynamic priority scheduler.
http://wiki.apache.org/hadoop/BristolHadoopWorkshopSpring2010?action=diff&rev1=5&rev2=6

--------------------------------------------------

  
  Sanders talk triggered an interesting discussion on whether the Grid model had delivered
on what it had promised, or not. The answer: some stuff got addressed, but some things (storage)
had been ignored, and turned out to be rather important.
  
+ == Thomas Sandholm: Economic Scheduling of Hadoop Jobs ==
+ 
+ [[http://www.slideshare.net/steve_l/economic-scheduling-of-hadoop-jobs|Slides]]
+ 
+ Thomas Sandholm joined us from via videoconference to talk about the scheduler that he and
Kevin Lai wrote.
+ 
+  * The two main schedulers are optimised for the Yahoo! and Facebook workloads. Although
converging they are tuned differently; both teams are nervous about changes that would reduce
their throughputs, as the cost would be significant.
+  * Hadoop 0.21 adds a plugin API to add your own scheduler more easily.
+  * The DynamicPriorityScheduler is designed for multiple users competing for time on a shared
cluster. 
+  * You bid for time; the scheduler gives priority to those who bid the most.
+  * You can bid $0, you will still get time if nobody else bids more than you.
+  * Running Map or Reduce jobs will get killed if higher priority work comes in. The scheduler
tries to be clever here and leave stuff that has been running a while alone (on the expectation
that it will finish soon). The benefits of killing processes comes in if people can schedule
long running jobs.
+  * It avoids any kind of history to make it scalable, no need to worry about persistence.
+  * If your bid doesn't get through, you don't get billed.
+  * To use: give every user/team their own queue
+ 
+ The scheduler is in the contrib directory for Hadoop 0.21; it's not easy to backport as
it uses the scheduler plugin API.
+ 

Mime
View raw message