mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Goel <ankitgoel2...@gmail.com>
Subject Re: Modifying kmeans algo
Date Wed, 23 Sep 2015 00:51:35 GMT
Hi,
What I wanted to do was modify the clustering algorithm, in  hopes of
experimenting with different versions of it. I'm not much hung over the MR
part of things, rather the clustering algo itself.

Secondly or alternatively, I wanted to which part or library calculates the
centroid. (In hopes of reconfiguring when the algo splits the clusters,
increasing the number of clusters and thus calculating a new centroid).

On Tue, Sep 22, 2015 at 6:09 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> On Mon, Sep 21, 2015 at 4:44 PM, Ankit Goel <ankitgoel2004@gmail.com>
> wrote:
>
> > If one wanted to modify the kmeans algorithm given with the mahout
> package,
> > how would/should one go about doing it?
> >
>
> If you want to modify the old map reduce code, please go right ahead.  The
> project members will not be maintaining that code going forward, however,
> so that modification will be all yours.
>
>
> >
> > Also what is the function that can be used to find the median point
> between
> > 2 or more vectors? As in I want the median point in vector format so
> that I
> > can use it as a new center maybe.
> >
>
> It sounds like you want to compute the medoid of several vectors [1] or
> possibly the geometric median [2]. Neither is particularly easy to compute
> and Mahout supports neither.
>
> You may also have wanted the vector mean.  That is trivial to compute ...
> just add up the vectors and divide by the number of vectors.
>
>
> [1] https://en.wikipedia.org/wiki/Medoid
>
> [2] https://en.wikipedia.org/wiki/Geometric_median
>



-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message