mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: MAHOUT-236 Cluster Evaluation Tools?
Date Wed, 07 Apr 2010 01:57:09 GMT
Hi Jeff,
            This is an good paper with a simple measure of cluster quality
measurement based on intra cluster density and inter cluster separation. Its
pretty easy to compute. Need to make it a map/reduce job
http://docs.google.com/viewer?a=v&q=cache:z5p9n04cBQEJ:www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf+clustering+quality&hl=en&gl=in&pid=bl&srcid=ADGEESiC-ocW6IWrKR4cb1t1ZqkzRKQ3tDv4UFBkVaUKU0gG3kADcPWIjs-60A0912nu8MFPsVM3pf9jKrP98dL-B-BaiOC9LObBS3VkJK6Mu6josZtVegLxp3BftduD3hFxtGOVZK_b&sig=AHIEtbSZwtgw9wmJoojQn7Dlz5OL67vICw
Robin


On Wed, Apr 7, 2010 at 7:03 AM, Jeff Eastman <jdog@windwardsolutions.com>wrote:

> Hi Robin,
>
> Great! I've got the refactoring changes for consolidating all the various
> cluster types under a Cluster interface (formerly Printable but now with id,
> numPoints and a center added). Dirichlet models still don't yet have
> meaningful ids implemented but they all do (so far anyway) have a notion of
> "numPoints" and a "center". I'm working on tests tomorrow to make sure the
> ClusterDumper actually works with Dirichlet clusters then I will commit
> that. Wednesday or Thursday most likely.
>
> BTW, I changed my mind about foisting off the old Printable interface on
> Vectors (but am still open to the idea if somebody actually working in math
> thinks it is worth doing). All the new Clusters use the vector formatting
> done in ClusterBase.
>
> What I'd really like is feedback from ClusterDumper users on what is
> working and what is needed to address MAHOUT-236. That includes you, right?
>
> Jeff
>
> PS: Ted, you expressed some doubts about the value of consolidating
> Dirichlet clusters with the others. So far it seems to be a reasonable fit
> but I'm doing the engineering on a tiny subset of simple models without
> enough theoretical insight to see any pitfalls ahead. Is there a
> "DistanceMeasure-like" discussion that might provide a firmer underpinning
> for this work?
>
>
>
>
> Robin Anil wrote:
>
>> No one yet. I am willing to help In case you need an extra pair of hands
>> on
>> this one.
>>
>> Robin
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message