mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vckay <darkvc...@gmail.com>
Subject Re: Kernels for Text Clustering
Date Thu, 14 Jul 2011 10:11:26 GMT
Not too sure what you mean by "raw text data", I am doing the usual: remove
stop words, stem etc and then computing TF-IDF vectors before trying to
cluster them.


2011/7/14 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>

> Hi vcaky,
>
> Are you using raw text data with k-means? It's usual to obtain some lower
> dimension and dense representation of the documents using Singular Value
> Decomposition and such techniques, and working with that representation
> instead. You may want to take a look at SVD algorithms in mahout.
>
> Best,
> Fernando.
>
> 2011/7/14 Vckay <darkvckay@gmail.com>
>
> > I am clustering some real world text data using K-Means. I recently came
> > across Kernel K-Means and wanted to know if someone who has had
> experience
> > with Kernels could comment on their appropriateness for text data, i.e,
> > Would using a Kernel boost k-means quality? ( I know this is rather
> general
> > but it is sort of hard to figure out if my high dimensional real world
> data
> > is linearly separable.) If so, are there any Kernel's with "practically
> > accepted" parameters?
> >
> > Thanks
> > VC
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message