mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: Judging the quality of clustering
Date Thu, 17 May 2012 12:58:07 GMT
Yes, that is the paper I used to implement CDbw. I've tried it a few 
times along with the simpler ClusterEvaluator metrics I took from Mahout 
In Action and they look to be reasonable - see the tests - though I have 
no way to judge their absolute values. Anything you can contribute in 
this area would be most welcome. Perhaps a wiki page?

On 5/16/12 1:14 PM, Pat Ferrel wrote:
> The reference was in the code for 
> On 5/16/12 9:56 AM, Pat Ferrel wrote:
>> Thanks, I've been looking at that. Is there a description of how to 
>> interpret those values? An academic paper maybe? The intra-cluster 
>> distance intuitively seems to correspond to something like cohesion. 
>> I don't get the intuition behind inter-cluster distances but Ted 
>> thinks they are the most important.
>> On 5/16/12 7:32 AM, Jeff Eastman wrote:
>>> Mahout has a ClusterEvaluator and a CDbwEvaluator that compute some 
>>> quality metrics (inter-cluster distance, intra-cluster-distance, 
>>> ...) that you may find useful. Both calculate a set of 
>>> representative points from the clustering output and compute the 
>>> (n^2) metrics over these points rather than all of the points in 
>>> each cluster.
>>> On 5/15/12 4:46 PM, Pat Ferrel wrote:
>>>> So many questions about best k, how to choose t1 and t2, how much 
>>>> help is dimensional reduction would have clear answers if we had a 
>>>> way to judge the quality of clusters.
>>>> Various methods were discussed here for a time: 
>>>> Has there been any work on building a measure of quality?

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message