Hi,
2010/11/1 Srivathsan Srinivas <srivathsan.srinivas@gmail.com>:
> Dear Ted,
>
> Thanks for pointing to Dirchlet mixture model. I shall look into that.
>
> Basically, I am looking into auto correlation function, Control Charts,
> Moving Average, Population Stability, and Poisson regression (much of the
> data can be described as dailyhourly counts)– I’d like to build a tool that
> would blend these approaches into a scorecard for proactive alerting for any
> outliers...
>
> For the above, I am interested in seeing how the timeseries data can be
> broken into manageable segments and distributedoff to different machines in
> a Hadoop network.
>
I've never seen something similar in hadoop, but my suggestion for a
good algorithm for
segmenting timeseries is:
Sliding Window And BottomUp (SWAB) from Keogh et. al. Here is the paper:
http://www.cs.ucr.edu/~eamonn/icdm01.pdf
and here a presentation:
wwwscf.usc.edu/~selinach/segmentationslides.pd
> Thanks again,
> Sri.
>
>
> On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> There is nothing explicit in Mahout for this, but you could use the
>> Dirchlet
>> mixture model clustering to do this.
>>
>> The idea would be to express your different observed time series or short
>> segments of time sequences as mixture
>> models and then find regions that are not well described by this mixture
>> model. Ideally, you would have a Markov
>> model underneath the mixture coefficients, but that is out of scope for
>> what
>> Mahout does for you right off the bat. It
>> wouldn't be too hard to merge the HMM code and the DP clustering to get
>> this, though.
>>
>> So the answer is no.
>>
>> But Mahout would be a decent substrate for building your own.
>>
>> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
>> srivathsan.srinivas@gmail.com> wrote:
>>
>> > Hi,
>> > Any pointers to techniques/papers that detect outliers in
>> timeseries
>> > of very large data sets using Mahout? I am interesting in seeing what
>> > techniques are favorable for use in largescale distributed systems using
>> > Hadoop/Mahout.
>> >
>> > Thanks,
>> > Sri.
>> >
>>
>
