oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Capacity vs Load in Resource Manager
Date Wed, 18 Apr 2012 05:05:27 GMT
Thanks Gabe for commenting back and appreciate your comments!

Cheers,
Chris

On Apr 17, 2012, at 9:25 AM, Resneck, Gabriel M (388J) wrote:

> Chris, is right.  The concepts of Load and Capacity do have meaning, and if I sent the
impression that these constructs are meaningless then that's my bad.
> What I was trying to convey in my explanation was simply that Load and Capacity have
no automatic relation to the resources required by jobs or available in nodes, and in order
to provide that relation you must configure load and capacity according to levels that the
user sees as appropriate.
> 
> Gabe =)
> 
> ________________________________________
> From: Mattmann, Chris A (388J) [chris.a.mattmann@jpl.nasa.gov]
> Sent: Monday, April 16, 2012 8:27 PM
> To: <dev@oodt.apache.org>
> Cc: Wong, Cynthia L (388J)
> Subject: Re: Capacity vs Load in Resource Manager
> 
> Hi Gabe,
> 
> On Apr 16, 2012, at 11:44 AM, Resneck, Gabriel M (388J) wrote:
> 
>> 
>> To use Chris's words, when using the "fresh-out-of-the-box" version of the RM, both
of the concepts of Capacity and Load are entirely arbitrary.
> 
> I'd clarify that while the default values set for these concepts are arbitrary, the concepts
themselves are not. Capacity is used
> by the AssignmentMonitor and is a core property of the ResourceNode class. Load, is leveraged
by the AssignmentMonitor
> to determine the current business of one of the ResourceNodes.
> 
>> They have no relation to any kind of resources available on your node machines.
> 
> Well, again, the default out of the box values for these concepts don't, but the concepts
themselves do.
> 
>> Therefore, if you give each job a load of 1 (regardless of the node resources required
to run the job) and if you give a node a capacity of 10, the RM will try to always have 10
jobs running on that node.
>> It does nothing to track resource usage on the node, so use of such a paradigm as
the one that I just described could be wildly inefficient.
> 
> Let's clarify that again. Saying it *does nothing* kind of doesn't sound right to me.
It *does* do something. It tracks how
> much load is currently on a node, compared to its current capacity, and provides that
information as-is to the Scheduler,
> which then in turn uses the information to determine a node "besting" algorithm to determine
what node to select to
> Batch a job out to. So, it does *do something*. It's just that it's not real-time and
more virtual profiling. And, let's be specific.
> The XMLAssignmentMonitor decides how this information will be used and provided and tracked.
This is just one
> potential implementation of the AssignmentMonitor RM extension point.
> 
> We could (and should) develop a Ganglia resource monitor that could leverage Ganglia
information to plug in. And
> we could develop a TorqueAssignmentMonitor that uses qmon or something like it to parse
the information out of
> Torque's queue. We could also connect in to Sun Grid Engine (SGE) or another DRM technology
to get this
> information too.
> 
> 
>> Because these numbers are arbitrary, I recommend carefully investigating the availability
of resources on your nodes and setting load and capacity levels using that information.  For
example, if you find that your jobs tend to be I/O bound when you have more than 3 running
simultaneously on the same node, then you could set your job load to 1 and the node capacity
to 3.  If you wanted more granularity, you could easily set the load to 33 and the capacity
to 100.  Since these numbers are entirely arbitrary, you have the freedom to make such changes.
 Obviously, not all jobs will be the same, so you may want to assign different loads to different
jobs and assign different capacities to nodes based upon the resources that each makes available.
> 
> Exactly. And to add to that, you can group different jobs into different queues, and
then queues to nodes, to control flow of jobs
> onto those nodes, based on a "queue type".
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message