mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Clustering techniques, tips and tricks
Date Sun, 03 Jan 2010 16:10:09 GMT

On Dec 31, 2009, at 3:39 PM, Ted Dunning wrote:

> - can the clustering algorithm be viewed in a probabilistic framework
> (k-means, LDA, Dirichlet = yes, agglomerative clustering using nearest
> neighbors = not so much)
> - is the definition of a cluster abstract enough to be flexible with regard
> to whether a cluster is a model or does it require stronger limits.
> (k-means = symmetric Gaussian with equal variance, Dirichlet = almost any
> probabilistic model)

Can you elaborate a bit more on these two?  I can see a bit on the probability side, as those
approaches play a factor in how similarity is determined, but I don't get the significance
of "cluster as a model".  Is it just a simplification that then makes it easier to ask: does
this document fit into the model?

View raw message