mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: MAHOUT-236 Cluster Evaluation Tools?
Date Thu, 08 Apr 2010 04:20:56 GMT
On Wed, Apr 7, 2010 at 11:50 PM, Jeff Eastman <jdog@windwardsolutions.com>wrote:

> Hi Robin,
>
> Interesting paper. I'm beginning to see how to MR the representative point
> selection already. The rest will hopefully become clearer with more study.
> Lots of MR jobs are needed to:



> a) get the data into Vectors, We have something for text, missing for other
> formats



> b) iterate (e.g. kmeans) over the data to produce a set of clusters, Done



> c) cluster the data, Done



> d) iterate over the clustered data to derive representative points for each
> cluster, and finally Done ;)



> e) produce the CDbw.- TODO




> And, of course all of this is again iterated with different values for the
> clustering algorithm's parameters. Should keep the lights on at PG&E
> producing power for the server farms.
>
>
>
> Robin Anil wrote:
>
>> Hi Jeff,
>>            This is an good paper with a simple measure of cluster quality
>> measurement based on intra cluster density and inter cluster separation.
>> Its
>> pretty easy to compute. Need to make it a map/reduce job
>>
>> http://docs.google.com/viewer?a=v&q=cache:z5p9n04cBQEJ:www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf+clustering+quality&hl=en&gl=in&pid=bl&srcid=ADGEESiC-ocW6IWrKR4cb1t1ZqkzRKQ3tDv4UFBkVaUKU0gG3kADcPWIjs-60A0912nu8MFPsVM3pf9jKrP98dL-B-BaiOC9LObBS3VkJK6Mu6josZtVegLxp3BftduD3hFxtGOVZK_b&sig=AHIEtbSZwtgw9wmJoojQn7Dlz5OL67vICw
>> Robin
>>
>>
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message