mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Vatkov <>
Subject Re: Clustering techniques, tips and tricks
Date Wed, 06 Jan 2010 02:51:32 GMT
I customized the lucene index-to-vector dumper already quite a lot (e.g.
applied stop-words (from file), stop-regex) but I am wondering how the input
vectors are later reachable if I start from cluster vectors, you say points
are somehow doing that, where can I read more or can you tell me more, or is
there a piece of code which would best guide me through the points format?

On Wed, Jan 6, 2010 at 4:43 AM, Drew Farris <> wrote:

> Each iteration of kmeans procuses a cluster-X folder, with X starting
> at 0. You would get clusters-0 in cases where the clusters converge
> after the first run.
> Whether your clusters will retain document id's is based on how you
> create the vectors. For example, the lucene vector dumper can be told
> to extract the value from a specific field in the index to use for the
> vector labels. These are carried through to the points file produced
> at the end of the k-means run.
> On Tue, Jan 5, 2010 at 9:36 PM, Bogdan Vatkov <>
> wrote:
> > Is there some description of the content of the cluster vector?
> > I also noticed that I end up with some folders clusters-0 and clusters-1,
> > but sometimes it is only clusters-0, when do we get the different folders
> > and which should be used as end result - e.g. by the ClusterDumper?

Best regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message