mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <>
Subject Re: MAHOUT-236 Cluster Evaluation Tools?
Date Wed, 07 Apr 2010 01:33:37 GMT
Hi Robin,

Great! I've got the refactoring changes for consolidating all the 
various cluster types under a Cluster interface (formerly Printable but 
now with id, numPoints and a center added). Dirichlet models still don't 
yet have meaningful ids implemented but they all do (so far anyway) have 
a notion of "numPoints" and a "center". I'm working on tests tomorrow to 
make sure the ClusterDumper actually works with Dirichlet clusters then 
I will commit that. Wednesday or Thursday most likely.

BTW, I changed my mind about foisting off the old Printable interface on 
Vectors (but am still open to the idea if somebody actually working in 
math thinks it is worth doing). All the new Clusters use the vector 
formatting done in ClusterBase.

What I'd really like is feedback from ClusterDumper users on what is 
working and what is needed to address MAHOUT-236. That includes you, right?


PS: Ted, you expressed some doubts about the value of consolidating 
Dirichlet clusters with the others. So far it seems to be a reasonable 
fit but I'm doing the engineering on a tiny subset of simple models 
without enough theoretical insight to see any pitfalls ahead. Is there a 
"DistanceMeasure-like" discussion that might provide a firmer 
underpinning for this work?

Robin Anil wrote:
> No one yet. I am willing to help In case you need an extra pair of hands on
> this one.
> Robin

View raw message