mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Cluster Evaluation 0.8 style
Date Wed, 11 Jul 2012 19:40:39 GMT
As I've said before this issue is still a problem.
This should be reopened and I sent you a link to get my data (only 8G 
good luck!)

My confusion with the per cluster density measure is because In 0.8 an 
output file is required for clusterdump but the per cluster density 
measure is not written to it. It's in the lNFO output to STDOUT. When I 
run a bunch of these the STDOUT is lost so I'll have to modify my 
scripts or update my KFinder code. I'd vote to include it in the output 
file in the future.

The only problem I've seen with the per cluster Intra-cluster density is 
that I get a lot of pruned clusters sometimes and the Intra-Cluster 
Density is not calculated for them. I think we've discussed this in the 

12/07/11 12:22:12 INFO evaluation.ClusterEvaluator: Intra-Cluster 
Density[766] = 0.6243875150474454

I really would like to get this stuff working and am willing to provide 
whatever help you need if you are in a position to work on it. I have 
0.8-SNAPSHOT building but am inexperienced debugging in this kind of 
large data situation but willing to learn. If you'd like me to try 
something out just point me in the right direction.

I'm also happy to test Ted's inter-cluster stuff too.

On 7/11/12 11:46 AM, Jeff Eastman wrote:
> The ClusterEvaluator has methods for both inter-cluster density and 
> intra-cluster density. The former computes the density using the 
> cluster centers, while the latter uses a set of representative points 
> extracted from the clustered points. This reduces the computational 
> overhead of calculating a density from all of the points from each 
> cluster.
> The unit test uses synthetic data and produces reasonable looking 
> results afaict. Have you had negative experiences with that?
> On 7/11/12 1:21 PM, Pat Ferrel wrote:
>> ...
>> It was my understanding that the ClusterEvaluator included an attempt 
>> to provide this measure with intra-cluster density per cluster though 
>> it looks like that output has been removed?

View raw message