oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Resneck, Gabriel M (388J)" <Gabriel.M.Resn...@jpl.nasa.gov>
Subject RE: Capacity vs Load in Resource Manager
Date Tue, 17 Apr 2012 16:25:58 GMT
Chris, is right.  The concepts of Load and Capacity do have meaning, and if I sent the impression
that these constructs are meaningless then that's my bad.
What I was trying to convey in my explanation was simply that Load and Capacity have no automatic
relation to the resources required by jobs or available in nodes, and in order to provide
that relation you must configure load and capacity according to levels that the user sees
as appropriate.

Gabe =)

________________________________________
From: Mattmann, Chris A (388J) [chris.a.mattmann@jpl.nasa.gov]
Sent: Monday, April 16, 2012 8:27 PM
To: <dev@oodt.apache.org>
Cc: Wong, Cynthia L (388J)
Subject: Re: Capacity vs Load in Resource Manager

Hi Gabe,

On Apr 16, 2012, at 11:44 AM, Resneck, Gabriel M (388J) wrote:

>
> To use Chris's words, when using the "fresh-out-of-the-box" version of the RM, both of
the concepts of Capacity and Load are entirely arbitrary.

I'd clarify that while the default values set for these concepts are arbitrary, the concepts
themselves are not. Capacity is used
by the AssignmentMonitor and is a core property of the ResourceNode class. Load, is leveraged
by the AssignmentMonitor
to determine the current business of one of the ResourceNodes.

> They have no relation to any kind of resources available on your node machines.

Well, again, the default out of the box values for these concepts don't, but the concepts
themselves do.

> Therefore, if you give each job a load of 1 (regardless of the node resources required
to run the job) and if you give a node a capacity of 10, the RM will try to always have 10
jobs running on that node.
>  It does nothing to track resource usage on the node, so use of such a paradigm as the
one that I just described could be wildly inefficient.

Let's clarify that again. Saying it *does nothing* kind of doesn't sound right to me. It *does*
do something. It tracks how
much load is currently on a node, compared to its current capacity, and provides that information
as-is to the Scheduler,
which then in turn uses the information to determine a node "besting" algorithm to determine
what node to select to
Batch a job out to. So, it does *do something*. It's just that it's not real-time and more
virtual profiling. And, let's be specific.
The XMLAssignmentMonitor decides how this information will be used and provided and tracked.
This is just one
potential implementation of the AssignmentMonitor RM extension point.

We could (and should) develop a Ganglia resource monitor that could leverage Ganglia information
to plug in. And
we could develop a TorqueAssignmentMonitor that uses qmon or something like it to parse the
information out of
Torque's queue. We could also connect in to Sun Grid Engine (SGE) or another DRM technology
to get this
information too.


> Because these numbers are arbitrary, I recommend carefully investigating the availability
of resources on your nodes and setting load and capacity levels using that information.  For
example, if you find that your jobs tend to be I/O bound when you have more than 3 running
simultaneously on the same node, then you could set your job load to 1 and the node capacity
to 3.  If you wanted more granularity, you could easily set the load to 33 and the capacity
to 100.  Since these numbers are entirely arbitrary, you have the freedom to make such changes.
 Obviously, not all jobs will be the same, so you may want to assign different loads to different
jobs and assign different capacities to nodes based upon the resources that each makes available.

Exactly. And to add to that, you can group different jobs into different queues, and then
queues to nodes, to control flow of jobs
onto those nodes, based on a "queue type".

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message