mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: randomseedgenerator
Date Wed, 01 Jul 2009 20:24:29 GMT

On Jul 1, 2009, at 2:07 PM, Adil Aijaz wrote:

> I was looking at the RandomSeedGenerator and, correct me if I am  
> wrong, but it is not really random; rather it does a bunch of  
> bernoulli trials where the points that are in the beginning of your  
> data are always going to have a higher chance of being selected than  
> those near the end.

I was just going off of Ted's suggestion that for k-Means it wasn't  
really all that important to be truly random for the initial seeds.   
We discussed PRNGs and a M/R way of doing it, but I didn't think it  
was necessary for this.  Fine if someone else wants to take it up.

> Maybe that's not a problem since given sufficient iterations kmeans  
> should converge toward a solution. But, I thought I'd point it out  
> in case there is an issue here.


> Adil

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

View raw message