mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shengchao Ding <dingshengc...@gmail.com>
Subject Re: org.apache.mahout.clustering.kmeans.RandonSeedGenerator.java problem
Date Sun, 13 Jan 2013 18:40:37 GMT
Mahout may need to extract the random sampling as an independent class
since it's a very often used algorithm.

On Sat, Jan 12, 2013 at 8:30 PM, Sean Owen <srowen@gmail.com> wrote:
> I'm sure someone fixed something like this a while ago but yes I still
> see this in the code. Search JIRA and file a bug
>
> On Sun, Jan 13, 2013 at 2:20 AM, sam wu <swu5530@gmail.com> wrote:
>> Sorry for my previous email, apparently not yet finished but get sent out.
>>
>> Here is the complete one
>>
>> Similar to MIA ch09 RandomPointUtil.java, random selection is not uniform
>> random.
>>
>> org.apache.mahout.clustering.kmeans.RandonSeedGenerator.java problem
>> --------
>> line 96-109
>>
>> if (currentSize < k) {
>>
>>             //select
>>
>>           } else if (random.nextInt(currentSize + 1) != 0) { // with chance
>> 1/(currentSize+1) pick new element
>>
>>             int indexToRemove = random.nextInt(currentSize); // evict one
>> chosen randomly
>>
>>             // replace with new
>>
>>           }
>>
>> -------
>>
>> again this is not uniform random.
>>
>> later sample will get much higher probability to be selected than beginning
>> sample.
>>
>> because currentSize stay to be k after initial k samples. and new sample
>> will be picked with 1/(k+1) probability.
>>
>> So, all ending samples will be selected with much higher prob.
>>
>> In case of 1000 samples, k=3 , most likely selected 3 samples will be > 980
>>
>>
>> Sam
>>
>> On Sat, Jan 12, 2013 at 6:07 PM, sam wu <swu5530@gmail.com> wrote:
>>
>>> Similar to MIA ch09 RandomPointUtil.java, random selection is not uniform
>>> random.
>>>
>>> org.apache.mahout.clustering.kmeans.RandonSeedGenerator.java problem
>>>
>>> line 96-109
>>>
>>> if (currentSize < k) {
>>>
>>>             //select
>>>
>>>           } else if (random.nextInt(currentSize + 1) != 0) { // with
>>> chance 1/(currentSize+1) pick new element
>>>
>>>             int indexToRemove = random.nextInt(currentSize); // evict one
>>> chosen randomly
>>>
>>>             // replace with new
>>>
>>>           }
>>>



-- 
Shengchao Ding

Mime
View raw message