The beta distribution can help with your problem of marking a system as
working or not, but I think that a better approach is the Poisson
distribution which is intended to describe events occurring in time.
The rub is that almost all processes worth monitoring have highly variable
rates and all of the simple statistical models presume a constant rate.
If your rate variations are predictable and you like the plush leather, very
clever, very correct approach, you can use variants on Poisson regression.
If you want to get results very simply and your major source of variation is
time of day and day of week, you can get a good approximation by simply
totalling the number of events for the same hour on the same day of the week
in recent history.
This gives you a rate which is the fundamental parameter for the Poisson
distribution.
The simplest alarm is based on the time since last event. That time should
be scaled by the current rate as estimated by the average rate of like
periods. You can compute the significance of this by taking the exponential
of the negative normalized time since last event or you can build a table of
normalized times between events and set your cutoff to give you a desired
balance of false alarms and late detections. I usually set such an alarm to
give a false alarm about once a month for system operators and about once a
year for CEO's.
On Thu, Mar 17, 2011 at 7:36 AM, Mikkel Meyer Andersen <mikl@mikl.dk> wrote:
> 2011/3/17 KARR, DAVID (ATTSI) <dk068x@att.com>:
> >> Original Message
> >> From: Mikkel Meyer Andersen [mailto:mikl@mikl.dk]
> >> Sent: Thursday, March 17, 2011 3:22 AM
> >> To: Commons Users List
> >> Subject: Re: [math] Anyone using BetaDistribution and
> >> BetaDistributionImpl
> >>
> >> Hi David,
> >>
> >> Yes, I am using the implementation of the beta distribution and am
> >> quite
> >> happy with it. Anything in particular you're thinking of that I can
> >> help you
> >> with?
> >
> > Well, primarily you could help me figure out how to use it :) , but
> > that's more of an issue with not understanding the statistics, as
> > opposed to not understanding the API.
> I will love to help you with usage of the API. In regards to
> statistics, I will be able to provide a confined amount of help since
> this is for commercial usage. For thorough statistical consultancy, I
> can provide assistance through my company (in that case, contact me
> directly).
> >
> > We have a large collection of individual data records that indicate
> > success/failure of an operation call, where there are a significant
> > number of possible operations, along with other permutations in the
> > record that associate it with a different "workflow". We can get
> > success/failure ratio of those operations in workflows over particular
> > time periods (15, 60, 180, 1440 minutes, et cetera), but we want to
> > process this data over a much larger time period (30 days or more) to
> > determine what's "normal" for those operations in the various workflows,
> > essentially building percentage ranges for each of those permutations
> > that indicate whether a permutation is "green", "orange", or "red".
> >
> > I've been told by someone who understands the statistics only a little
> > better than me that a beta distribution function could help here, but I
> > won't be able to implement this until we get help from someone who
> > really understands the statistics here.
> If your outcome of each call is only either success or failure, the
> number of successes in n calls is socalled Binomial distributed, and
> inference of the probability parameter (success rate) can be made in
> sevaral ways. The easiest is to make classical inference in the
> Binomial distribution. Another option, involving the Beta
> distribution, is to make a Bayesian analysis (the Beta serves as a
> prior to the Binomial likelihood yielding a posterior Beta). So that's
> probably where the Betas has been thought to come into play. If you
> want to adjust for some covariates (such as the time of the day the
> call happened), you can make Binomial regression, e.g. logistic or
> probit regression. Doing regression you can test to see if the
> covariates have influence on the outcome, e.g. if the calls happened
> in the morning, the success rate is higher or whatever might be the
> case.
>
> Indicator colors such as green and orange can e.g. be assigned if the
> probability rate is e.g. within 75% or 90% (confidence interval for
> traditional or credible interval for Bayesian) of the normal, or red
> else, whichever comes first, respectively.
>
> Cheers, Mikkel.
> >
> > 
> > To unsubscribe, email: userunsubscribe@commons.apache.org
> > For additional commands, email: userhelp@commons.apache.org
> >
> >
>
> 
> To unsubscribe, email: userunsubscribe@commons.apache.org
> For additional commands, email: userhelp@commons.apache.org
>
>
