mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankit Goel <ankitgoel2...@gmail.com>
Subject Re: Kmeans clusterdump Interpretation
Date Tue, 21 Jul 2015 01:25:17 GMT
Oh, I thought kmeans gave me a point vector as a centroid, not a calculated
point central to a cluster. I guess in this case I would be looking for the
most central point vector (from the index ) that I can use as a
representative of the cluster.

On Tue, Jul 21, 2015 at 6:41 AM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> I'm not sure centroid id is even a defined thing, especially since the
> centroid, in my understanding, is just a point in space, not necessarily a
> point in your data.
>
> Are you trying to find the most-central point in a given cluster?
>
> On Mon, Jul 20, 2015 at 5:18 PM, Ankit Goel <ankitgoel2004@gmail.com>
> wrote:
>
> > Hi,
> > I've been messing with mahout 0.10 and kmeans clustering with a solr
> 4.6.1
> > index. The data is news articles. The --field option for kmeans is set to
> > "content". The idField is set to "title" (just so i can analyse it
> faster).
> > The clusterdump of the kmeans result gives me a proper output, but I cant
> > figure out the id of the vector chosen as the center. There are only
> 14-15
> > articles so I am not hung up about the cluster performance at this time.
> >
> > I used random seeds for the kmeans commandline.
> > For reference, this is the commandline cluster dump I am executing
> >
> > bin/mahout clusterdump -i $MAHOUT_HOME/testCluster/clusters-3-final
> > -p $MAHOUT_HOME/testCluster/clusteredPoints -d $MAHOUT_HOME/dict.txt -b 5
> >
> > The output I get is off the form
> >
> > :{"r":
> >
> > top terms
> >
> > xxxxx==>xxxxx
> >
> > Weight : [props - optional]:  Point:
> >
> >  1.0 : [distance=0.0]: [{"account":0.026}.......other features]
> >
> > 1.0 : [distance=0.3963903651622338]: [....]
> >
> >
> > So how exactly do I get the centroid id? I have even tried accessing it
> > with java
> >
> > ClusterWritable value.getValue().getCenter() but this just gives me the
> > features and values of the centroid.
> >
> > Also, please do explain the meaning of "account":0.026 (just making sure
> I
> > know it right). I used tfidf.
> >
> > --
> > Regards,
> > Ankit Goel
> > http://about.me/ankitgoel
> >
>



-- 
Regards,
Ankit Goel
http://about.me/ankitgoel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message