mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: kMeans Help
Date Fri, 26 Jun 2009 23:59:00 GMT
We need to make that handled separately then from the various jobs.   
That was one of the things that was different about the KMeansJob call.

On Jun 26, 2009, at 7:45 PM, Jeff Eastman wrote:

> Found the call in the syntheticcontrol/kmeans.Job had true for the  
> overwrite output flag. Don't think that was your problem, but  
> something similar must be at work.
>
>
>
> Jeff Eastman wrote:
>> Running the latest trunk, I get a file not found exception running  
>> synthetic control on the $output/data file. Looks like output got  
>> deleted somewhere but have not discovered where yet. Perhaps Canopy  
>> is broken or KMeans is purging output?
>>
>>
>> Grant Ingersoll wrote:
>>> I'm running trunk.  Using the data at http://people.apache.org/wikipedia/n2.tar.gz

>>>  (a dump of 2302 documents from a Lucene index of Wikipedia.  The  
>>> chunks file in that same directory contains the original files).   
>>> Vectors are normalized using L2.
>>>
>>> When I run K-Means on it via:  
>>> org.apache.mahout.clustering.kmeans.KMeansDriver --input /Users/ 
>>> grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/part- 
>>> full.txt --clusters /Users/grantingersoll/projects/lucene/solr/ 
>>> wikipedia/devWorks/n2/clusters --k 10 --output /Users/ 
>>> grantingersoll/projects/lucene/solr/wikipedia/devWorks/n2/k-output  
>>> --distance org.apache.mahout.utils.CosineDistanceMeasure
>>>
>>> I get the two directories seen in n2-output.  The clusters-0 and  
>>> clusters-1 files both contain a single vector which is all 0.
>>>
>>> I've also tried SquaredEuclidean, but to no avail.
>>>
>>> Any insight into what I'm doing wrong would be appreciated.
>>>
>>> Thanks,
>>> Grant
>>>
>>>
>>
>>
>>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message