Dear Ted,
Thanks for pointing to Dirchlet mixture model. I shall look into that.
Basically, I am looking into auto correlation function, Control Charts,
Moving Average, Population Stability, and Poisson regression (much of the
data can be described as daily|hourly counts)– I’d like to build a tool that
would blend these approaches into a scorecard for proactive alerting for any
outliers...
For the above, I am interested in seeing how the time-series data can be
broken into manageable segments and distributed-off to different machines in
a Hadoop network.
Thanks again,
Sri.
On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> There is nothing explicit in Mahout for this, but you could use the
> Dirchlet
> mixture model clustering to do this.
>
> The idea would be to express your different observed time series or short
> segments of time sequences as mixture
> models and then find regions that are not well described by this mixture
> model. Ideally, you would have a Markov
> model underneath the mixture coefficients, but that is out of scope for
> what
> Mahout does for you right off the bat. It
> wouldn't be too hard to merge the HMM code and the DP clustering to get
> this, though.
>
> So the answer is no.
>
> But Mahout would be a decent substrate for building your own.
>
> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
> srivathsan.srinivas@gmail.com> wrote:
>
> > Hi,
> > Any pointers to techniques/papers that detect outliers in
> time-series
> > of very large data sets using Mahout? I am interesting in seeing what
> > techniques are favorable for use in large-scale distributed systems using
> > Hadoop/Mahout.
> >
> > Thanks,
> > Sri.
> >
>
|