mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Masoud Moshref Javadi <moshr...@usc.edu>
Subject Re: kmeans centroid using or function
Date Tue, 17 Jul 2012 06:33:33 GMT
Each cluster will take storage space as sum of bits 1 in Or() of all of 
its points. and the nearest cluster to a point is the one that its size 
does not change after including that point. So Or() of bits of points of 
a cluster can be a representative of that cluster.
It seems that I need a kind of subclass of AbstractCluster that keeps 
track of this Or() bit vector.

On 7/16/2012 11:22 PM, Sean Owen wrote:
> Hmm is that going to give a point that acts like a centroid though, that is
> a "mid point" under some distance metric? I don't think you want to do this.
>
> On Tue, Jul 17, 2012 at 5:04 AM, Masoud Moshref Javadi <moshrefj@usc.edu>wrote:
>
>> I want to run kmeans on binary data and the definition of centroid for my
>> application is the Or() of bits of all points inside a cluster.
>> Where, in Mahout, should I change?
>>
>> --
>> Masoud Moshref Javadi
>> Computer Engineering PhD Student
>> Ming Hsieh Department of Electrical Engineering
>> University of Southern California
>>
>>

-- 
Masoud Moshref Javadi
Computer Engineering PhD Student
Ming Hsieh Department of Electrical Engineering
University of Southern California




Mime
View raw message