mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florent Empis <florent.em...@gmail.com>
Subject Re: Beginner questions on clustering & M/R
Date Sat, 17 Jul 2010 21:11:49 GMT
Hi,

On the SVD part... why would that help?

Thanks  for your input:)

Florent

2010/7/15 Ted Dunning <ted.dunning@gmail.com>

> Clustering of time series data is usually better done in an abstract
> relatively low dimensional coordinate space based on some transform like a
> locality sensitive frequency transform.  Gabor transforms might be
> appropriate.
>
> You might be able to get away with something like an SVD of your daily
> change data.
>
> On Thu, Jul 15, 2010 at 7:51 AM, Florent Empis <florent.empis@gmail.com
> >wrote:
>
> > Hi,
> >
> > I want to learn more on clustering techniques. I have skimmed through
> > Programming Collective Intelligence and Mahout in Action in the past but
> I
> > don't have them on hand at the moment... :(
> > I've seen Isabel Drost mail about test data on http://mldata.org/about/
> > I've had an idea of using
> http://mldata.org/repository/view/stockvalues/for
> > a pet project.
> > My idea is as follow: can we see a common behaviour between companies'
> > stock
> > value?
> > I would expect ending up with cluster of banking sector shares, utilities
> > share, media etc... and maybe some more unexpected cluster, who knows?
> >
> > My idea is basically:
> > 1°)Transform the dataset from values to daily variation as percentage
> > drop/raise (data is then normalized)
> > 2°)Apply clustering technique(s)
> >
> > The issue may seem silly but as I understand it, clustering happens in a
> 2
> > (or more) dimension space.
> > I know I have 2 dimensions: variation and time, but I can't wrap my head
> on
> > the problem...
> >
> > I *think* that the K-Means example does exactly what I intend to do my
> > second step, is this correct?
> > However, I can grasp what the 2 dimensional display represent exactly:
> what
> > are the x and y axis ?
> >
> > Added question: I am fairly new to the M/R paradigm, but let's say I
> would
> > like to do step 1 (data normalization) in a M/R fashion. Would the
> > following
> > be a good idea:
> > My data is a matrix of k stock values S in n intervals of time.
> > I call the first stock in the file, first and second period:
> > S1,t & S1,t+1 ...
> >
> > Map Step: input: ((S1,t ... S1,t+n),... ,(Sk,t ... Sk,t+n) )
> > output (( (S1,t;S1,t+1),...,(S1,t+n-1;S1,t+n)), ... ,(
> > (Sk,t;Sk,t+1),...,(Sk,t+n-1;Sk,t+n)) )
> > Reduce Step:
> > ( (%S1,t+1.....%S1,t+n), ...,(%S1,t+1.....%S1,t+n))
> >
> > I apologize for my beginner's questions but.... everyone has to start
> > somewhere :-)
> >
> > BR,
> >
> > Florent Empis
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message