mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <>
Subject Re: MAHOUT-236 Cluster Evaluation Tools?
Date Wed, 07 Apr 2010 01:57:09 GMT
Hi Jeff,
            This is an good paper with a simple measure of cluster quality
measurement based on intra cluster density and inter cluster separation. Its
pretty easy to compute. Need to make it a map/reduce job

On Wed, Apr 7, 2010 at 7:03 AM, Jeff Eastman <>wrote:

> Hi Robin,
> Great! I've got the refactoring changes for consolidating all the various
> cluster types under a Cluster interface (formerly Printable but now with id,
> numPoints and a center added). Dirichlet models still don't yet have
> meaningful ids implemented but they all do (so far anyway) have a notion of
> "numPoints" and a "center". I'm working on tests tomorrow to make sure the
> ClusterDumper actually works with Dirichlet clusters then I will commit
> that. Wednesday or Thursday most likely.
> BTW, I changed my mind about foisting off the old Printable interface on
> Vectors (but am still open to the idea if somebody actually working in math
> thinks it is worth doing). All the new Clusters use the vector formatting
> done in ClusterBase.
> What I'd really like is feedback from ClusterDumper users on what is
> working and what is needed to address MAHOUT-236. That includes you, right?
> Jeff
> PS: Ted, you expressed some doubts about the value of consolidating
> Dirichlet clusters with the others. So far it seems to be a reasonable fit
> but I'm doing the engineering on a tiny subset of simple models without
> enough theoretical insight to see any pitfalls ahead. Is there a
> "DistanceMeasure-like" discussion that might provide a firmer underpinning
> for this work?
> Robin Anil wrote:
>> No one yet. I am willing to help In case you need an extra pair of hands
>> on
>> this one.
>> Robin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message