Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 42924 invoked from network); 6 Dec 2008 07:30:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Dec 2008 07:30:46 -0000 Received: (qmail 64043 invoked by uid 500); 6 Dec 2008 07:30:56 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 64004 invoked by uid 500); 6 Dec 2008 07:30:56 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 63993 invoked by uid 99); 6 Dec 2008 07:30:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Dec 2008 23:30:56 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Dec 2008 07:29:26 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5C4F1234C321 for ; Fri, 5 Dec 2008 23:29:44 -0800 (PST) Message-ID: <434892622.1228548584377.JavaMail.jira@brutus> Date: Fri, 5 Dec 2008 23:29:44 -0800 (PST) From: "Thomas Sandholm (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4768) Dynamic Priority Scheduler that allows queue shares to be controlled dynamically by a currency In-Reply-To: <1679101330.1228355984160.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4768?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D126= 54038#action_12654038 ]=20 Thomas Sandholm commented on HADOOP-4768: ----------------------------------------- Hi Matei, when I was implementing this I played around with a number of different app= roaches. The goals were to make the dynamic scheduler as independent of the= underlying schedulers as possible, and to require as little changes as pos= sible to them. I didn't want to add a seperate deployed service as this int= roduces another point of failure and maintenanance. So I hooked into the sc= hedulers regular event/bookkeeping loop without even requiring a seperate t= hread to be spawn instead. In terms of updating the config files of the sch= edulers (pools file and capaity-scheduler) asynchronously without requiring= any changes at all to the schedulers, it is something I also tried, and it= was quite simple for the fairshare scheduler but it introduces a dependenc= y on the xml format if you don't want to do some xpath like replacement (wh= ih turned out to be both too complex and too slow for our purpose). Updatin= g the config files would lead to more I/O overhead too, now the shares are = communicated directly to the shedulers in memory. I don't think the reverse= dependeny is too bad either, the scheduler just get a list of queue/share = values from a config property and can then utilize those in whatever way ma= kes sense to the local scheduler. My patches to the capacity scheduler and = the fairshare scheduler should rather be seen as examples for scheduler dev= elopers how to utilize the dynamic scheduler rather than final solutions.= =20 The important thing is that the dynamic scheduler allows control over and a= ccounts for budget spent on different levels of quality of service/priority= . This QoS/priority can then be enforced and implemented in any number of w= ays, the dynamic scheduler doesn't care, as long as spending more currency = per time unit will give you better performance.=20 Thanks for the more detailed info on the fairshare scheduler, I still think= that the guaranteed allocations were the best match, but if it makes sense= to pay more currency for higher fair-shares you could enforce the shares g= ranted by the dynamic scheduler in a more sophisticated way. I don't think = the interface between the schedulers has to change for this to be done thou= gh. One use case is that you could hook this feature into a secure banking syst= em where budgets can be transferred from the user to the cluster owner auto= matically. We have used this approach successfully in a system called Tycoo= n (http://tycoon.hpl.hp.com) but instead of allocating map/reduce task slot= s it allocates virtual machine shares using Xen (like EC2 but with variable= pricing and finer grained resource control).=20 Another use case is a cloud computing test bed that we are designing togeth= er with Intel and Yahoo (that I presented at the venues mentioned in the pa= tch description). In this scenario researchers are granted some quota, e.g.= based on their contribution to the testbed. The quota can then be used by = them to obtain resources when they need them and at a QoS level that matche= s their needs. Hope this clarifies things a bit. If you want more info on the big picture = you can look at some of the papers and presentations on the tycoon site men= tioned above or the test bed site, www.opencirrus.org (under construction).= =20 > Dynamic Priority Scheduler that allows queue shares to be controlled dyna= mically by a currency > -------------------------------------------------------------------------= --------------------- > > Key: HADOOP-4768 > URL: https://issues.apache.org/jira/browse/HADOOP-4768 > Project: Hadoop Core > Issue Type: New Feature > Components: contrib/capacity-sched, contrib/fair-share > Affects Versions: 0.20.0 > Reporter: Thomas Sandholm > Assignee: Thomas Sandholm > Fix For: 0.20.0 > > Attachments: HADOOP-4768-capacity-scheduler.patch, HADOOP-4768-dy= namic-scheduler.patch, HADOOP-4768-fairshare.patch > > > Contribution based on work presented at the Hadoop User Group meeting in = Santa Clara in September and the HadoopCamp in New Orleans in November. > From README: > This package implements dynamic priority scheduling for MapReduce jobs. > Overview > -------- > The purpose of this scheduler is to allow users to increase and decrease > their queue priorities continuosly to meet the requirements of their > current workloads. The scheduler is aware of the current demand and makes > it more expensive to boost the priority under peak usage times. Thus > users who move their workload to low usage times are rewarded with > discounts. Priorities can only be boosted within a limited quota. > All users are given a quota or a budget which is deducted periodically > in configurable accounting intervals. How much of the budget is=20 > deducted is determined by a per-user spending rate, which may > be modified at any time directly by the user. The cluster slots=20 > share allocated to a particular user is computed as that users > spending rate over the sum of all spending rates in the same accounting > period. > Configuration > ------------- > This scheduler has been designed as a meta-scheduler on top of=20 > existing MapReduce schedulers, which are responsible for enforcing > shares computed by the dynamic scheduler in the cluster. Thie configurati= on > of this MapReduce scheduler does not have to change when deploying > the dynamic scheduler. > Hadoop Configuration (e.g. hadoop-site.xml): > mapred.jobtracker.taskScheduler This needs to be set to=20 > org.apache.hadoop.mapred.DynamicPrio= rityScheduler > to use the dynamic scheduler. > mapred.queue.names All queues managed by the dynamic sc= heduler must be listed > here (comma separated no spaces) > Scheduler Configuration: > mapred.dynamic-scheduler.scheduler The Java path of the MapReduce sched= uler that should > enforce the allocated shares. > Has been tested with: > org.apache.hadoop.mapred.FairSchedul= er > and > org.apache.hadoop.mapred.CapacityTas= kScheduler > mapred.dynamic-scheduler.budgetfile The full OS path of the file from wh= ich the > budgets are read. The synatx of this= file is: > > separated by newlines where budget c= an be specified > as a Java float > mapred.dynamic-scheduler.spendfile The full OS path of the file from wh= ich the > user/queue spending rate is read. It= allows > the queue name to be placed into the= path > at runtime, e.g.: > /home/%QUEUE%/.spending > Only the user(s) who submit jobs to = the > specified queue should have write ac= cess > to this file. The syntax of the file= is > just: > > where the spending rate is specified= as a > Java float. If no spending rate is s= pecified > the rate defaults to budget/1000. > mapred.dynamic-scheduler.alloc Allocation interval, when the schedu= ler rereads the > spending rates and recalculates the = cluster shares. > Specified as seconds between allocat= ions. > Default is 20 seconds. > mapred.dynamic-scheduler.budgetset Boolean which is true if the budget = should be deducted=20 > by the scheduler and the updated bud= get written to the > budget file. Default is true. Settin= g this to false is > useful if there is a tool that contr= ols budgets and > spending rates externally to the sch= eduler. > Runtime Configuration: > mapred.scheduler.shares The shares that should be allocated = to the specified queue. > The configuration property is a comm= a separated list of > strings where the odd positioned ele= ments are the=20 > queue names and the even positioned = elements are the shares > as Java floats of the preceding queu= e name. It is updated > for all the queues atomically in eac= h allocation pass. MapReduce > schedulers such as the Fair and Capa= cityTask schedulers > are expected to read from this prope= rty periodically. > Example property value: "queue1,45.0= ,queue2,55.0" --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.