mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: kMeans Help
Date Fri, 26 Jun 2009 21:21:57 GMT
Running the latest trunk, I get a file not found exception running 
synthetic control on the $output/data file. Looks like output got 
deleted somewhere but have not discovered where yet. Perhaps Canopy is 
broken or KMeans is purging output?


Grant Ingersoll wrote:
> I'm running trunk.  Using the data at 
> http://people.apache.org/wikipedia/n2.tar.gz (a dump of 2302 documents 
> from a Lucene index of Wikipedia.  The chunks file in that same 
> directory contains the original files).  Vectors are normalized using L2.
>
> When I run K-Means on it via: 
> org.apache.mahout.clustering.kmeans.KMeansDriver --input 
> /Users/grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/part-full.txt 
> --clusters 
> /Users/grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/clusters 
> --k 10 --output 
> /Users/grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/k-output 
> --distance org.apache.mahout.utils.CosineDistanceMeasure
>
> I get the two directories seen in n2-output.  The clusters-0 and 
> clusters-1 files both contain a single vector which is all 0.
>
> I've also tried SquaredEuclidean, but to no avail.
>
> Any insight into what I'm doing wrong would be appreciated.
>
> Thanks,
> Grant
>
>


Mime
View raw message