hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vivek Ratan" <viv...@yahoo-inc.com>
Subject RE: How should we handle queue configuaration in hadoop?
Date Tue, 03 Jun 2008 08:54:58 GMT
I agree with you, Arkady, that the RM can be thought of as existing
outside of Hadoop, and handling more than just MR jobs (see my response
to Doug's comment in HADOOP-3421). It'll likely get there some day, but
there are many steps along the way. For Version 1, to get something out
soon, it makes a lot of sense to add the queueing functionality to the
JT, which already handles scheduling. At some point soon after, I
suppose we can move the scheduling code away from the JT, into perhaps a
separate class or library. But, to reiterate, we've been trying hard to
keep the scheduling/resource management modular and separate enough so
that it can be made more generic and pluggable. 

My questions were asked with this assumption - that the queue/Org
functionality is added to today's JT. I think it's fair to assume that
this may not be a permanent solution. But it may be a long while before
we have a generic Resource Manager that sits outside Hadoop, so we still
need a decent solution to these questions. 

As to whether it makes sense to enhance the JT for V1, or start
focussing on a more generic solution right away (it seemed like that's
what you were suggesting), maybe we can use HADOOP-3444 for that

> -----Original Message-----
> From: arkady borkovsky [mailto:arkady@yahoo-inc.com] 
> Sent: Tuesday, June 03, 2008 12:03 PM
> To: core-dev@hadoop.apache.org
> Subject: Re: How should we handle queue configuaration in hadoop? 
> HADOOP-3421 seems to combine 2 sorts of requirements:
>     (a) requirements for resource manager for Hadoop (map reduce) jobs
>     (b) requirements for operations that can be performed on 
> Hadoop (map reduce) jobs necessary to implement (a) and a 
> certain class of other "resource managers" with potentially 
> different requirements.
> One can imagine a resource manager being run on a single 
> machine outside a Hadoop grid.  It would have a local 
> database of jobs, queues, etc and their status.  And it would 
> talk to Hadoop to get the current state of the running jobs, 
> and to request to do something with specific jobs -- kill, 
> put to sleep, change the resource quota, etc.  Its 
> counterpart on Hadoop grid would not need to know all the 
> concepts that the "resource manager" operates with.
> Such layered design may make it easier to answer os of the questions.
> E.g. it kind of implies that (b) is part of Hadoop and 
> belongs to JIRA, while (a) may be completely specific for 
> different organizations.
> 2c
> On Jun 2, 2008, at 10:39 PM, Vivek Ratan wrote:
> > I'd like to get some feedback on how to implement configuration for 
> > queues.
> >
> > Quick background: As part of the new Resource Manager in Hadoop 
> > (HADOOP-3421), a single Hadoop installation supports one or more 
> > queues that user submit jobs to. Eventually (it is hoped), an 
> > installation will support one or more Orgs, each with one of more 
> > queues.
> >
> > The problem: Queues have attributes: a name, whether it supports 
> > priorities, a 'guaranteed capacity', a list of allowed 
> users, a list 
> > of rejected users, and so on. How do we handle this configuration? 
> > Some
> > constraints:
> > * We'd like the default installation to have a single queue with 
> > default values, so the system works out of the box, but an 
> admin can 
> > configure multiple queues, each with its own config values. 
> Different 
> > installations can have different number of queues.
> > * Orgs, queues, and users provide a hierarchy. You could 
> set default 
> > values for some config variables in the Org, and individual queues 
> > could override them. Similarly, individual users could override 
> > Org/queue defaults. This is more of a long term goal. For 
> V1, we can 
> > get away with queue-specific configuration only.
> > * Some config values can be changed by the admin while the 
> system is 
> > running, and these need to be re-read by Hadoop within a reasonable 
> > amount of time. For example, an admin may dynamically change the 
> > guaranteed capacity of queues (if new machines are added to 
> a cluster, 
> > for example). You don't want to restart the JT to read new values.
> >
> > What are the implementation choices we're facing?
> >
> > - Where do we specify config values? It seems clear that 
> > hadoop-default.xml should contain configuration for a single queue 
> > with appropriate default values. If ad admin wants to set 
> up multiple 
> > queues, this information can go in a separate file. 
> hadoop-site.xml? 
> > Or maybe a separate config file for queues? While having multiple 
> > config files leads to problems in managing them, there are some 
> > supporting arguments for a separate config file for queues: 
> this file 
> > can be re-read periodically (which avoids having to re-read 
> > hadoop-site.xml), this file is only read by the JobTracker, 
> so there's 
> > no issue of overriding values elsewhere.
> >
> > - How do we specify config values? You can have more than 
> one queue, 
> > and each queue has its attributes.
> > * You could have properties like "hadoop.scheduler.queue1.name", 
> > "hadoop.scheduler.queue1.guaranteed_capacity",
> > "hadoop.scheduler.queue2.name"... Bit of a pain to write, and since 
> > the number od queues is not known statically, how do you 
> know how many 
> > of these properties to read? You could have a separate 
> property that 
> > tells you how many queues there are, and then the JT can build the 
> > property names dynamically. (HADOOP-3407 would also help here).
> > * You may want comma separated values. So 
> > "hadoop.scheduler.queue.names"
> > would have comma separated values for all queue names, 
> > "hadoop.scheduler.queue.guaranteed_capacity" would have comma 
> > separated values for capacities for each queue. This can get very 
> > difficult to maintain as you have to make sure the attribute values 
> > for each queue show up in the right place among the comma separated 
> > values.
> >
> > Do we have any examples of config settings where you can have a 
> > dynamic number of top-level entries, and each entry has multiple 
> > attributes?
> >
> > Should this discussion be on a Jira? I hesitated, as there 
> seems to be 
> > more than one issue to resolve here.

View raw message