Apologies... I had an early flight this morning.
Yes. Mikkel is completely correct.
The same basic idea can be used for similar time periods, computing the
current expected success rate. The cumulative beta distribution
can then give you a measure of how exception the current time period is. I
would expect that you would need to calibrate the alarm threshold
manually to give an acceptable false positive rate. My guess is that the
rate will actually vary a fair bit which will make alarms more common
than you might expect.
The Poisson method can still be used because the rate of incoming traffic is
probably pretty well predicted from the history. That means that
the rate of successful conversions is a useful proxy for the ratio.
The Poisson method has the advantage that the alarm can be raised sooner
than is easy to do with the binomial alarm.
If you really want to test conversion rate and need a prompt alarm, then I
would use the number of raw impressions that have been seen
since the last convergence as the key test statistic. This is naturally
rate adjusted for the number of impressions and you don't even really
have to correlate impressions with conversions to get a good statistic. You
may want to adjust the alarm threshold by time of day and day
of week to deal with convergence rate variability. The theoretic
distribution underlying this alarm is the negative binomial distribution.
On Thu, Mar 17, 2011 at 2:28 PM, Mikkel Meyer Andersen <mikl@mikl.dk> wrote:
> Hi,
>
> I agree that Poisson would be the obvious choice if what being modeled
> is the number of events in a given period of time. But as far as I
> understood, the success rate is also interesting (assuming two types
> of events: successes and failures). So which model to use really does
> depend on what you want to make inference about.
>
> Cheers, Mikkel.
>
> 2011/3/17 Ted Dunning <ted.dunning@gmail.com>:
> >
> > The beta distribution can help with your problem of marking a system as
> > working or not, but I think that a better approach is the Poisson
> > distribution which is intended to describe events occurring in time.
> > The rub is that almost all processes worth monitoring have highly
> variable
> > rates and all of the simple statistical models presume a constant rate.
> > If your rate variations are predictable and you like the plush leather,
> very
> > clever, very correct approach, you can use variants on Poisson
> regression.
> > If you want to get results very simply and your major source of variation
> is
> > time of day and day of week, you can get a good approximation by simply
> > totalling the number of events for the same hour on the same day of the
> week
> > in recent history.
> > This gives you a rate which is the fundamental parameter for the Poisson
> > distribution.
> > The simplest alarm is based on the time since last event. That time
> should
> > be scaled by the current rate as estimated by the average rate of like
> > periods. You can compute the significance of this by taking the
> exponential
> > of the negative normalized time since last event or you can build a table
> of
> > normalized times between events and set your cutoff to give you a desired
> > balance of false alarms and late detections. I usually set such an alarm
> to
> > give a false alarm about once a month for system operators and about once
> a
> > year for CEO's.
> >
> >
> > On Thu, Mar 17, 2011 at 7:36 AM, Mikkel Meyer Andersen <mikl@mikl.dk>
> wrote:
> >>
> >> 2011/3/17 KARR, DAVID (ATTSI) <dk068x@att.com>:
> >> >> Original Message
> >> >> From: Mikkel Meyer Andersen [mailto:mikl@mikl.dk]
> >> >> Sent: Thursday, March 17, 2011 3:22 AM
> >> >> To: Commons Users List
> >> >> Subject: Re: [math] Anyone using BetaDistribution and
> >> >> BetaDistributionImpl
> >> >>
> >> >> Hi David,
> >> >>
> >> >> Yes, I am using the implementation of the beta distribution and am
> >> >> quite
> >> >> happy with it. Anything in particular you're thinking of that I can
> >> >> help you
> >> >> with?
> >> >
> >> > Well, primarily you could help me figure out how to use it :) , but
> >> > that's more of an issue with not understanding the statistics, as
> >> > opposed to not understanding the API.
> >> I will love to help you with usage of the API. In regards to
> >> statistics, I will be able to provide a confined amount of help since
> >> this is for commercial usage. For thorough statistical consultancy, I
> >> can provide assistance through my company (in that case, contact me
> >> directly).
> >> >
> >> > We have a large collection of individual data records that indicate
> >> > success/failure of an operation call, where there are a significant
> >> > number of possible operations, along with other permutations in the
> >> > record that associate it with a different "workflow". We can get
> >> > success/failure ratio of those operations in workflows over particular
> >> > time periods (15, 60, 180, 1440 minutes, et cetera), but we want to
> >> > process this data over a much larger time period (30 days or more) to
> >> > determine what's "normal" for those operations in the various
> workflows,
> >> > essentially building percentage ranges for each of those permutations
> >> > that indicate whether a permutation is "green", "orange", or "red".
> >> >
> >> > I've been told by someone who understands the statistics only a little
> >> > better than me that a beta distribution function could help here, but
> I
> >> > won't be able to implement this until we get help from someone who
> >> > really understands the statistics here.
> >> If your outcome of each call is only either success or failure, the
> >> number of successes in n calls is socalled Binomial distributed, and
> >> inference of the probability parameter (success rate) can be made in
> >> sevaral ways. The easiest is to make classical inference in the
> >> Binomial distribution. Another option, involving the Beta
> >> distribution, is to make a Bayesian analysis (the Beta serves as a
> >> prior to the Binomial likelihood yielding a posterior Beta). So that's
> >> probably where the Betas has been thought to come into play. If you
> >> want to adjust for some covariates (such as the time of the day the
> >> call happened), you can make Binomial regression, e.g. logistic or
> >> probit regression. Doing regression you can test to see if the
> >> covariates have influence on the outcome, e.g. if the calls happened
> >> in the morning, the success rate is higher or whatever might be the
> >> case.
> >>
> >> Indicator colors such as green and orange can e.g. be assigned if the
> >> probability rate is e.g. within 75% or 90% (confidence interval for
> >> traditional or credible interval for Bayesian) of the normal, or red
> >> else, whichever comes first, respectively.
> >>
> >> Cheers, Mikkel.
> >> >
> >> > 
> >> > To unsubscribe, email: userunsubscribe@commons.apache.org
> >> > For additional commands, email: userhelp@commons.apache.org
> >> >
> >> >
> >>
> >> 
> >> To unsubscribe, email: userunsubscribe@commons.apache.org
> >> For additional commands, email: userhelp@commons.apache.org
> >>
> >
> >
>
