mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject kMeans Help
Date Fri, 26 Jun 2009 20:59:47 GMT
I'm running trunk.  Using the data at http://people.apache.org/wikipedia/n2.tar.gz 
  (a dump of 2302 documents from a Lucene index of Wikipedia.  The  
chunks file in that same directory contains the original files).   
Vectors are normalized using L2.

When I run K-Means on it via:  
org.apache.mahout.clustering.kmeans.KMeansDriver --input /Users/ 
grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/part- 
full.txt --clusters /Users/grantingersoll/projects/lucene/solr/ 
wikipedia/devWorks/n2/clusters --k 10 --output /Users/grantingersoll/ 
projects/lucene/solr/wikipedia/devWorks/n2/k-output --distance  
org.apache.mahout.utils.CosineDistanceMeasure

I get the two directories seen in n2-output.  The clusters-0 and  
clusters-1 files both contain a single vector which is all 0.

I've also tried SquaredEuclidean, but to no avail.

Any insight into what I'm doing wrong would be appreciated.

Thanks,
Grant

Mime
View raw message