mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Mahon <>
Subject Re: Finding thresholds for canopy
Date Wed, 27 Apr 2011 20:55:33 GMT
No, I mean the area. If all the vectors fit in a AxBxC sized box, and 
you expect about 10 clusters, you could make an initial guess that the 
clusters will be (A/10)xBxC in size and you could try T1=(A/10)*B*C. 
I've no idea how well this would work in practice... probably not very 

On 04/27/2011 01:50 PM, Camilo Lopez wrote:
> By area of the space you mean just the total number of vectors I'm using?
> On 2011-04-27, at 4:46 PM, Paul Mahon wrote:
>> If you have a guess at how many clusters you want you could take the total area of
the space and divide by the number of clusters to get an initial guess of T2 or T1. That might
work to get you started, depending on the distribution.
>> On 04/27/2011 12:39 PM, Camilo Lopez wrote:
>>> I'm using Canopy as first step for K-means clustering, is there any algorithmic,
or even a good heuristic to estimate good T1 and T2 from the vectorized data?

View raw message