On Dec 31, 2009, at 3:39 PM, Ted Dunning wrote:
> - can the clustering algorithm be viewed in a probabilistic framework
> (k-means, LDA, Dirichlet = yes, agglomerative clustering using nearest
> neighbors = not so much)
>
> - is the definition of a cluster abstract enough to be flexible with regard
> to whether a cluster is a model or does it require stronger limits.
> (k-means = symmetric Gaussian with equal variance, Dirichlet = almost any
> probabilistic model)
Can you elaborate a bit more on these two? I can see a bit on the probability side, as those
approaches play a factor in how similarity is determined, but I don't get the significance
of "cluster as a model". Is it just a simplification that then makes it easier to ask: does
this document fit into the model?
-Grant
|