commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikkel Meyer Andersen <m...@mikl.dk>
Subject Re: [math] Anyone using BetaDistribution and BetaDistributionImpl
Date Thu, 17 Mar 2011 21:28:48 GMT
Hi,

I agree that Poisson would be the obvious choice if what being modeled
is the number of events in a given period of time. But as far as I
understood, the success rate is also interesting (assuming two types
of events: successes and failures). So which model to use really does
depend on what you want to make inference about.

Cheers, Mikkel.

2011/3/17 Ted Dunning <ted.dunning@gmail.com>:
>
> The beta distribution can help with your problem of marking a system as
> working or not, but I think that a better approach is the Poisson
> distribution which is intended to describe events occurring in time.
> The rub is that almost all processes worth monitoring have highly variable
> rates and all of the simple statistical models presume a constant rate.
> If your rate variations are predictable and you like the plush leather, very
> clever, very correct approach, you can use variants on Poisson regression.
> If you want to get results very simply and your major source of variation is
> time of day and day of week, you can get a good approximation by simply
> totalling the number of events for the same hour on the same day of the week
> in recent history.
> This gives you a rate which is the fundamental parameter for the Poisson
> distribution.
> The simplest alarm is based on the time since last event.  That time should
> be scaled by the current rate as estimated by the average rate of like
> periods.  You can compute the significance of this by taking the exponential
> of the negative normalized time since last event or you can build a table of
> normalized times between events and set your cutoff to give you a desired
> balance of false alarms and late detections.  I usually set such an alarm to
> give a false alarm about once a month for system operators and about once a
> year for CEO's.
>
>
> On Thu, Mar 17, 2011 at 7:36 AM, Mikkel Meyer Andersen <mikl@mikl.dk> wrote:
>>
>> 2011/3/17 KARR, DAVID (ATTSI) <dk068x@att.com>:
>> >> -----Original Message-----
>> >> From: Mikkel Meyer Andersen [mailto:mikl@mikl.dk]
>> >> Sent: Thursday, March 17, 2011 3:22 AM
>> >> To: Commons Users List
>> >> Subject: Re: [math] Anyone using BetaDistribution and
>> >> BetaDistributionImpl
>> >>
>> >> Hi David,
>> >>
>> >> Yes, I am using the implementation of the beta distribution and am
>> >> quite
>> >> happy with it. Anything in particular you're thinking of that I can
>> >> help you
>> >> with?
>> >
>> > Well, primarily you could help me figure out how to use it :) , but
>> > that's more of an issue with not understanding the statistics, as
>> > opposed to not understanding the API.
>> I will love to help you with usage of the API. In regards to
>> statistics, I will be able to provide a confined amount of help since
>> this is for commercial usage. For thorough statistical consultancy, I
>> can provide assistance through my company (in that case, contact me
>> directly).
>> >
>> > We have a large collection of individual data records that indicate
>> > success/failure of an operation call, where there are a significant
>> > number of possible operations, along with other permutations in the
>> > record that associate it with a different "workflow".  We can get
>> > success/failure ratio of those operations in workflows over particular
>> > time periods (15, 60, 180, 1440 minutes, et cetera), but we want to
>> > process this data over a much larger time period (30 days or more) to
>> > determine what's "normal" for those operations in the various workflows,
>> > essentially building percentage ranges for each of those permutations
>> > that indicate whether a permutation is "green", "orange", or "red".
>> >
>> > I've been told by someone who understands the statistics only a little
>> > better than me that a beta distribution function could help here, but I
>> > won't be able to implement this until we get help from someone who
>> > really understands the statistics here.
>> If your outcome of each call is only either success or failure, the
>> number of successes in n calls is so-called Binomial distributed, and
>> inference of the probability parameter (success rate) can be made in
>> sevaral ways. The easiest is to make classical inference in the
>> Binomial distribution. Another option, involving the Beta
>> distribution, is to make a Bayesian analysis (the Beta serves as a
>> prior to the Binomial likelihood yielding a posterior Beta). So that's
>> probably where the Betas has been thought to come into play. If you
>> want to adjust for some covariates (such as the time of the day the
>> call happened), you can make Binomial regression, e.g. logistic or
>> probit regression. Doing regression you can test to see if the
>> covariates have influence on the outcome, e.g. if the calls happened
>> in the morning, the success rate is higher or whatever might be the
>> case.
>>
>> Indicator colors such as green and orange can e.g. be assigned if the
>> probability rate is e.g. within 75% or 90% (confidence interval for
>> traditional or credible interval for Bayesian) of the normal, or red
>> else, whichever comes first, respectively.
>>
>> Cheers, Mikkel.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> > For additional commands, e-mail: user-help@commons.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message