mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Judging the quality of clustering
Date Wed, 16 May 2012 17:14:31 GMT
The reference was in the code for 
http://www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf

On 5/16/12 9:56 AM, Pat Ferrel wrote:
> Thanks, I've been looking at that. Is there a description of how to 
> interpret those values? An academic paper maybe? The intra-cluster 
> distance intuitively seems to correspond to something like cohesion. I 
> don't get the intuition behind inter-cluster distances but Ted thinks 
> they are the most important.
>
> On 5/16/12 7:32 AM, Jeff Eastman wrote:
>> Mahout has a ClusterEvaluator and a CDbwEvaluator that compute some 
>> quality metrics (inter-cluster distance, intra-cluster-distance, ...) 
>> that you may find useful. Both calculate a set of representative 
>> points from the clustering output and compute the (n^2) metrics over 
>> these points rather than all of the points in each cluster.
>>
>> On 5/15/12 4:46 PM, Pat Ferrel wrote:
>>> So many questions about best k, how to choose t1 and t2, how much 
>>> help is dimensional reduction would have clear answers if we had a 
>>> way to judge the quality of clusters.
>>>
>>> Various methods were discussed here for a time: 
>>> http://www.lucidimagination.com/search/document/dab8c1f3c3addcfe/validating_clustering_output
>>>
>>> Has there been any work on building a measure of quality?
>>>
>>>
>>

Mime
View raw message