mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nfantone <nfant...@gmail.com>
Subject Re: Clustering from DB
Date Mon, 27 Jul 2009 18:33:35 GMT
> Well, it does matter to some degree since picking random vectors tends to give you dense
vectors whereas text gives you very sparse vectors.

> Different patterns of sparsity can cause radically different time complexity
for the clustering.

I have yet to find a random combination of vectors that actually
benefits substantially the performance of kMeans. I have also tried
real datasets (like the one I was initially using from large amounts
of data defining consumer's buying habits) to no avail. How should a
collection of vectors be created to, say, not compromise the algorithm
functionality significantly?

Mime
View raw message