I tried the shapelet approach for video signature generation once upon a
time and was not enormously impressed with the accuracy/recall tradeoffs.
To some degree, I expect that this was partially due to my own deficient
implementation, but I really do think that there may be better approaches
such as vector quantization of a state space of some kind.
On Wed, Nov 3, 2010 at 6:02 PM, Srivathsan Srinivas <
srivathsan.srinivas@gmail.com> wrote:
> Thanks. I am reading a recent paper of Keogh's = time series shapelets
> : a novel technique that allows accurate, interpretable and fast
> classification. A springer publication of data mining and knowledge
> discovery, 18 June 2010.
>
> I am basically skimming several papers with different ideas to see
> what can bec easily and efficiently parrallelized for using hadoop...
>
> Thanks much for pointing to the presentation and the paper.
>
> Srinivas.
>
> On Wednesday, November 3, 2010, Federico Castanedo <fcastane@inf.uc3m.es>
> wrote:
> > Hi,
> >
> > 2010/11/1 Srivathsan Srinivas <srivathsan.srinivas@gmail.com>:
> >> Dear Ted,
> >>
> >> Thanks for pointing to Dirchlet mixture model. I shall look into that.
> >>
> >> Basically, I am looking into auto correlation function, Control Charts,
> >> Moving Average, Population Stability, and Poisson regression (much of
> the
> >> data can be described as dailyhourly counts)– I’d like to build a tool
> that
> >> would blend these approaches into a scorecard for proactive alerting for
> any
> >> outliers...
> >>
> >> For the above, I am interested in seeing how the timeseries data can be
> >> broken into manageable segments and distributedoff to different
> machines in
> >> a Hadoop network.
> >>
> > I've never seen something similar in hadoop, but my suggestion for a
> > good algorithm for
> > segmenting timeseries is:
> >
> > Sliding Window And BottomUp (SWAB) from Keogh et. al. Here is the paper:
> >
> > http://www.cs.ucr.edu/~eamonn/icdm01.pdf
> >
> > and here a presentation:
> > wwwscf.usc.edu/~selinach/segmentationslides.pd
> >
> >
> >> Thanks again,
> >> Sri.
> >>
> >>
> >> On Mon, Nov 1, 2010 at 10:21 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >>
> >>> There is nothing explicit in Mahout for this, but you could use the
> >>> Dirchlet
> >>> mixture model clustering to do this.
> >>>
> >>> The idea would be to express your different observed time series or
> short
> >>> segments of time sequences as mixture
> >>> models and then find regions that are not well described by this
> mixture
> >>> model. Ideally, you would have a Markov
> >>> model underneath the mixture coefficients, but that is out of scope for
> >>> what
> >>> Mahout does for you right off the bat. It
> >>> wouldn't be too hard to merge the HMM code and the DP clustering to get
> >>> this, though.
> >>>
> >>> So the answer is no.
> >>>
> >>> But Mahout would be a decent substrate for building your own.
> >>>
> >>> On Mon, Nov 1, 2010 at 8:02 AM, Srivathsan Srinivas <
> >>> srivathsan.srinivas@gmail.com> wrote:
> >>>
> >>> > Hi,
> >>> > Any pointers to techniques/papers that detect outliers in
> >>> timeseries
> >>> > of very large data sets using Mahout? I am interesting in seeing what
> >>> > techniques are favorable for use in largescale distributed systems
> using
> >>> > Hadoop/Mahout.
> >>> >
> >>> > Thanks,
> >>> > Sri.
> >>> >
> >>>
> >>
> >
>
