mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: randomseedgenerator
Date Wed, 01 Jul 2009 19:20:14 GMT
I hadn't looked at this before but yeah you have a good point, it does
slightly favor earlier elements. There's no reason this shouldn't be
properly random even if it doesn't make much difference in practice.

mahout-dev, mind if I...

1. Make the algorithm truly select a random k elements?
2. May we standardize on the RandomUtils class for creating random
number generators? it means we can control the implementation, but
better, can make it deterministic at will for testing
3. And may I fix up some style stuff... for instance nothing should be
both 'transient' and 'static', and "== true" is redundant, etc.

On Wed, Jul 1, 2009 at 7:07 PM, Adil Aijaz<adil@yahoo-inc.com> wrote:
> I was looking at the RandomSeedGenerator and, correct me if I am wrong, but
> it is not really random; rather it does a bunch of bernoulli trials where
> the points that are in the beginning of your data are always going to have a
> higher chance of being selected than those near the end.
>
> Maybe that's not a problem since given sufficient iterations kmeans should
> converge toward a solution. But, I thought I'd point it out in case there is
> an issue here.
>
> Adil
>

Mime
View raw message