mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srivathsan Srinivas <>
Subject Re: outlier detection in time-series using Mahout
Date Mon, 01 Nov 2010 15:54:20 GMT
Dear Ted,

Thanks for pointing to Dirchlet mixture model. I shall look into that.

Basically, I am looking into auto correlation function, Control Charts,
Moving Average, Population Stability, and Poisson regression (much of the
data can be described as daily|hourly counts)– I’d like to build a tool that
would blend these approaches into a scorecard for proactive alerting for any

For the above, I am interested in seeing how the time-series data can be
broken into manageable segments and distributed-off to different machines in
a Hadoop network.

Thanks again,

On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <> wrote:

> There is nothing explicit in Mahout for this, but you could use the
> Dirchlet
> mixture model clustering to do this.
> The idea would be to express your different observed time series or short
> segments of time sequences as mixture
> models and then find regions that are not well described by this mixture
> model.  Ideally, you would have a Markov
> model underneath the mixture coefficients, but that is out of scope for
> what
> Mahout does for you right off the bat.  It
> wouldn't be too hard to merge the HMM code and the DP clustering to get
> this, though.
> So the answer is no.
> But Mahout would be a decent substrate for building your own.
> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
>> wrote:
> > Hi,
> >       Any pointers to techniques/papers that detect outliers in
> time-series
> > of very large data sets using Mahout? I am interesting in seeing what
> > techniques are favorable for use in large-scale distributed systems using
> > Hadoop/Mahout.
> >
> > Thanks,
> > Sri.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message