mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Dunning (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-30) dirichlet process implementation
Date Wed, 12 Nov 2008 17:37:44 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646967#action_12646967
] 

Ted Dunning commented on MAHOUT-30:
-----------------------------------

Jeff,

These look like really nice refactorings.  The process is nice and clear.

The only key trick that may confuse people is that each step is a sampling.  Thus assignment
to clusters does NOT assign to the best cluster, it picks a cluster at random, biased by the
mixture parameters and model pdf's.  Likewise, model computation does NOT compute the best
model, it samples from the distribution given by the data.  Same is true for the mixture parameters.

Your code does this.  I just think that this is a hard point for people to understand in these
techniques. 

> dirichlet process implementation
> --------------------------------
>
>                 Key: MAHOUT-30
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-30
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Isabel Drost
>         Attachments: MAHOUT-30.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model. The implementation
is only slightly more difficult and the result is a (nearly)
> > non-parametric clustering algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message