oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruce Barkstrom <brbarkst...@gmail.com>
Subject Capacity
Date Sat, 14 Apr 2012 20:06:17 GMT
Defining "capacity" is not an easy thing to do.
The books that David Patterson and colleagues
have written on the computer design show that none of
the usual metrics, like MFLOPS, MIPS, and so
on have any sort of well-founded theoretical
basis.  I tried using Dongerra's MFLOPS database
and found the most sensible regression was just
the clock speed of the processor - but that was
before multi-core machines came along and
folks avoided increasing the clock speed to
avoid power consumption.

The most sensible production metric that I can
think of is the wall clock time from job inception
to completion - probably including the time to
stage data to someplace useful and then the time
to move it from the computation back to a sensible
storage spot.  That metric at least fits into large
scale scheduling approaches (and looks like a
Gantt chart activity).

DOE has apparently been trying to simulate
actual computational loads for some supercomputer
simulations - but they don't try to apply their
simulations to the entire runs they're going
to make - even they don't have the computer
power for that.

I've also got a book on manufacturing systems
engineering that says capacity is a stochastic
property of systems.  I'll even note that schedules
can have a structurally stochastic behavior (as in
"I didn't know the machine was going to catch on
fire - and it burned the whole factory - so now what
do we do?")

Thus, the key guidance is - keep the metric
simple.  If you want solid numerical values, you'll
have to run an experiment on how long it will take
to run a job.

I should perhaps note that there are similar pleasantries on
trying to do network capacity estimates.  One of the typical
approaches to dealing with capacities is to do a queuing
theory model.  If I recall, the time for delivery of a file over
the Internet has a very long tailed distribution for completion.
An interested party could probably build a probability distribution
and then use occasional tests to update the distribution.
However, simple math with a few parameters it's not.

Bruce R. Barkstrom

Mime
View raw message