mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: [jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)
Date Mon, 16 Aug 2010 14:12:22 GMT
>
>
> In this case, the document needs to be paired with the nearest cluster
> right, something like Canopy clustering should give partial connection
> graph
>

Just populate similarity values for documents in a canopy, very sparse but
still connected graph due to the overlapping nature of canopy clustering

>
> Robin
>
>
>
>
>>  On Mon, Aug 16, 2010 at 7:00 AM, Robin Anil <robin.anil@gmail.com>
>> wrote:
>>
>> > From a GSOC angle, it needn't be done, its upto your mentor to decide. I
>> am
>> > interested more in getting this completed and pushed out so that people
>> can
>> > really use it. If you can spare time after GSOC and still hang around
>> the
>> > community and help in getting this polished, it will be great.
>> >
>> > To create your pairwise similarity(0-1  1 means dissimilar) matrix(it
>> can
>> > be
>> > the other way around as well), see the DistanceMeasure implementations.
>> > Creating the pairwise matrix is non trivial from a scalability stand
>> point.
>> >
>> > A complete spectral clustering package should take an input set of
>> > documents, create the matrix and run clustering and output the clusters.
>> To
>> > get an idea of your work till now, what are the blocks missing from this
>> > ideal package scenario?
>> >
>> >
>> > Robin
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message