mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pallavi Palleti <pallavi.pall...@corp.aol.com>
Subject Re: Fuzzy K Means
Date Fri, 19 Feb 2010 08:23:04 GMT
Apologies. My observation were with m=2 where the points were all near 
by. However, when I tried with m=3, I found the clusters much better 
than what we see when m=2. Also, I am using the cluster initialization 
patch for initializing the clusters.

Thanks
Pallavi

Robin Anil wrote:
> Yes, I am seeing the same behaviour with m=2 but the convergence is faster
>
> On Wed, Feb 17, 2010 at 11:21 PM, Palleti, Pallavi <
> pallavi.palleti@corp.aol.com> wrote:
>
>   
>> How many iterations of FuzzyKMeans you are running? Here is my
>> observation- When I ran for few iterations,the cluster centroids are far
>> off. However, when I ran for more than 50 iterations or so, the cluster
>> points are still different but they are very much near by as if they are
>> same. By the way, I am using m=3 in membership function.
>>
>> Thanks
>> Pallavi
>>
>> -----Original Message-----
>> From: Robin Anil [mailto:robin.anil@gmail.com]
>> Sent: Wednesday, February 17, 2010 8:10 PM
>> To: mahout-dev@lucene.apache.org
>> Subject: Re: Fuzzy K Means
>>
>> Tests are passing fine. But Not when testing reuters.
>>
>> On Wed, Feb 17, 2010 at 8:07 PM, Pallavi Palleti <
>> pallavi.palleti@corp.aol.com> wrote:
>>
>>     
>>> If we just need to verify with some sample dataset, we already have
>>> the data in TestFuzzyKMeansClustering code. won't that suffice?
>>> Otherwise, I need to manually generate some sample dataset as I don't
>>> have this small dataset with me. I am actually running on movielens
>>> data using movie ratings as vector (movie as dimension , rating as
>>>       
>> coefficient) and user as point.
>>     
>>> Thanks
>>> Pallavi
>>>
>>> Robin Anil wrote:
>>>
>>>       
>>>> I tracked the versions back to before the change to Writables were
>>>>         
>> done.
>>     
>>>> There is nothing significant change in the code.
>>>>
>>>> Can you give me a small dataset 10 points maybe 5 dimensions. I can
>>>> verify the trunk in Case?
>>>>
>>>> Robin
>>>>
>>>> On Wed, Feb 17, 2010 at 7:49 PM, Pallavi Palleti <
>>>> pallavi.palleti@corp.aol.com> wrote:
>>>>
>>>>
>>>>
>>>>         
>>>>> I have a local version which I have submitted long back and I am
>>>>> using it on real data and is not giving same point for all clusters.
>>>>>           
>>>>> However, I haven't tried with latest mahout code. I have kept my
>>>>> code to output data as text so that it is easy for me to verify.
>>>>> However, current mahout code outputs it as binary data (as
>>>>> sequencefile). So, it is difficult to verify.
>>>>>
>>>>>
>>>>> Thanks
>>>>> Pallavi
>>>>>
>>>>> Robin Anil wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> Have you verified the trunk code on some real data. I am getting
>>>>>> same point for all clusters regardless of the distnce measure
>>>>>>
>>>>>> Robin
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 17, 2010 at 6:41 PM, Pallavi Palleti <
>>>>>> pallavi.palleti@corp.aol.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Yes. It shouldn't be a problem. My point was that we are extending
>>>>>>>               
>>>>>>> numpoints as part of ClusterBase, though we are not using it
in
>>>>>>> SoftCluster.
>>>>>>> Other that that, I don't see any issue w.r.t. functionality.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Pallavi
>>>>>>>
>>>>>>> Robin Anil wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> In the impl of SoftClusters on writeOut it calculates the
>>>>>>>> centroid and writes it and when read(in) it reads the centroid
in
>>>>>>>>                 
>> to the center.
>>     
>>>>>>>> In ClusterDumper it reads into the ClusterBase and does
>>>>>>>> value.getCenter(); It should work normally right
>>>>>>>>
>>>>>>>> Robin
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Feb 17, 2010 at 6:02 PM, Pallavi Palleti <
>>>>>>>> pallavi.palleti@corp.aol.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Yes. But not the total number of points. So, the numpoints
from
>>>>>>>>> ClusterBase will not be used in SoftCluster. numpoints
is
>>>>>>>>> specific to Kmeans similar to weightedpoint total for
fuzzy
>>>>>>>>> kmeans.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Robin Anil wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> the center is still the averaged out centroid right?
>>>>>>>>>> weightedtotalvector/totalprobWeight
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Feb 17, 2010 at 5:10 PM, Pallavi Palleti
<
>>>>>>>>>> pallavi.palleti@corp.aol.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> I haven't yet gone thru ClusterDumper. However,
ClusterBase
>>>>>>>>>>> would be having number of points to average out
>>>>>>>>>>> (pointTotal/numPoints as per
>>>>>>>>>>> kmeans)
>>>>>>>>>>> where
>>>>>>>>>>> as SoftCluster will have weighted point total.
So, I am
>>>>>>>>>>> wondering how can we reuse ClusterBase here?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Pallavi
>>>>>>>>>>>
>>>>>>>>>>> Robin Anil wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> yes. So that cluster dumper can print it
out.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Feb 17, 2010 at 5:02 PM, Pallavi
Palleti <
>>>>>>>>>>>> pallavi.palleti@corp.aol.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>>>> Hi Robin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> when you meant by reusing ClusterBase,
are you planning to
>>>>>>>>>>>>> extend ClusterBase in SoftCluster? For
example, SoftCluster
>>>>>>>>>>>>> extends ClusterBase?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Pallavi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Robin Anil wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> I have been trying to convert FuzzyKMeans
SoftCluster(which
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>> should be ideally be named FuzzyKmeansCluster)
to use the
>>>>>>>>>>>>>> ClusterBase.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am getting* the same center* for
all the clusters. To aid
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>> the conversion all i did was remove
the center vector from
>>>>>>>>>>>>>> the SoftCluster class and reuse the
same from the
>>>>>>>>>>>>>> ClusterBase. These are essentially
making no change in the
>>>>>>>>>>>>>> tests which passes correctly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I am questioning whether the implementation
keeps the
>>>>>>>>>>>>>> average center at all ? Anyone who
has used FuzzyKMeans
>>>>>>>>>>>>>> experiencing this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Robin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                             
>>>>         
>
>   

Mime
View raw message