mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: kMeans Help
Date Fri, 26 Jun 2009 23:45:57 GMT
Found the call in the syntheticcontrol/kmeans.Job had true for the 
overwrite output flag. Don't think that was your problem, but something 
similar must be at work.



Jeff Eastman wrote:
> Running the latest trunk, I get a file not found exception running 
> synthetic control on the $output/data file. Looks like output got 
> deleted somewhere but have not discovered where yet. Perhaps Canopy is 
> broken or KMeans is purging output?
>
>
> Grant Ingersoll wrote:
>> I'm running trunk.  Using the data at 
>> http://people.apache.org/wikipedia/n2.tar.gz (a dump of 2302 
>> documents from a Lucene index of Wikipedia.  The chunks file in that 
>> same directory contains the original files).  Vectors are normalized 
>> using L2.
>>
>> When I run K-Means on it via: 
>> org.apache.mahout.clustering.kmeans.KMeansDriver --input 
>> /Users/grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/part-full.txt 
>> --clusters 
>> /Users/grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/clusters 
>> --k 10 --output 
>> /Users/grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/k-output 
>> --distance org.apache.mahout.utils.CosineDistanceMeasure
>>
>> I get the two directories seen in n2-output.  The clusters-0 and 
>> clusters-1 files both contain a single vector which is all 0.
>>
>> I've also tried SquaredEuclidean, but to no avail.
>>
>> Any insight into what I'm doing wrong would be appreciated.
>>
>> Thanks,
>> Grant
>>
>>
>
>
>


Mime
View raw message