mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: How to choose the intioal clusters for K-mean from Tf-IDF vectors
Date Mon, 17 Nov 2014 23:54:29 GMT
Canopy is deprecated. 

Depends on what you want to do, are you trying to find the closest documents to some set that
you have hand classified?

All Canopy does is seed centroids for kmeans to start with. Kmeans then iterates towards “better"
ones. You can do the same with your docs but they won’t remain the centroids after iteration.

On Nov 17, 2014, at 2:11 PM, Sean Farrell <> wrote:

Hi Donni,

I believe that the canopy clustering algorithm will do what you want,
though I haven't played around with it myself yet. The clustering chapter
in the 'Mahout in Action' book covers this fairly well.



*----------------------Dr Sean FarrellData Scientist*

On Tue, Nov 18, 2014 at 12:01 AM, Donni Khan <>

> Hi All,
> I'm working with text clustering. I want to select specific documents(as a
> vectors) to be centroIDs fo k-means.
> I have created the TF-IDF for my dataset by using Mahout, and I would like
> to choose the initioal clusters from TFIDF vectors.
> Anyone has an idea Hw I can do it by Mahout?
> Many thanks in advance.
> Donni

View raw message