mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Federico Castanedo <fcast...@inf.uc3m.es>
Subject Re: outlier detection in time-series using Mahout
Date Wed, 03 Nov 2010 20:22:07 GMT
Hi,

2010/11/1 Srivathsan Srinivas <srivathsan.srinivas@gmail.com>:
> Dear Ted,
>
> Thanks for pointing to Dirchlet mixture model. I shall look into that.
>
> Basically, I am looking into auto correlation function, Control Charts,
> Moving Average, Population Stability, and Poisson regression (much of the
> data can be described as daily|hourly counts)– I’d like to build a tool that
> would blend these approaches into a scorecard for proactive alerting for any
> outliers...
>
> For the above, I am interested in seeing how the time-series data can be
> broken into manageable segments and distributed-off to different machines in
> a Hadoop network.
>
I've never seen something similar in hadoop, but my suggestion for a
good algorithm for
segmenting time-series is:

Sliding Window And Bottom-Up (SWAB) from Keogh et. al. Here is the paper:

http://www.cs.ucr.edu/~eamonn/icdm-01.pdf

and here a presentation:
www-scf.usc.edu/~selinach/segmentation-slides.pd


> Thanks again,
> Sri.
>
>
> On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> There is nothing explicit in Mahout for this, but you could use the
>> Dirchlet
>> mixture model clustering to do this.
>>
>> The idea would be to express your different observed time series or short
>> segments of time sequences as mixture
>> models and then find regions that are not well described by this mixture
>> model.  Ideally, you would have a Markov
>> model underneath the mixture coefficients, but that is out of scope for
>> what
>> Mahout does for you right off the bat.  It
>> wouldn't be too hard to merge the HMM code and the DP clustering to get
>> this, though.
>>
>> So the answer is no.
>>
>> But Mahout would be a decent substrate for building your own.
>>
>> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
>> srivathsan.srinivas@gmail.com> wrote:
>>
>> > Hi,
>> >       Any pointers to techniques/papers that detect outliers in
>> time-series
>> > of very large data sets using Mahout? I am interesting in seeing what
>> > techniques are favorable for use in large-scale distributed systems using
>> > Hadoop/Mahout.
>> >
>> > Thanks,
>> > Sri.
>> >
>>
>

Mime
View raw message