mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Eastman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-30) dirichlet process implementation
Date Sat, 15 Nov 2008 17:19:46 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647880#action_12647880
] 

Jeff Eastman commented on MAHOUT-30:
------------------------------------

The above patch makes several improvements to the above:
* refactored state updating and cluster sampling into DirichletState
* refactored creating list of models into ModelDistribution
* refactored state parameters from DirichletCluster to DirichletState
* refactored count into the model
* changed list<Model> to Model[]
* added significance filtering to print out
* increased number of iterations to 30 to demonstrate better convergence

The algorithm now produces the following output when run over 10,000 points:
* Using fixed random seed for repeatability.
* testDirichletCluster10000
* Generating 4000 samples m=[1.0, 1.0] sd=3.0
* Generating 3000 samples m=[1.0, 0.0] sd=0.1
* Generating 3000 samples m=[0.0, 1.0] sd=0.1
* sample[0]= normal(n=4037 m=[0.80, 0.73] sd=1.40), normal(n=3844 m=[0.51, 0.51] sd=0.68),
normal(n=1092 m=[0.51, 0.47] sd=0.53), normal(n=794 m=[1.26, 1.60] sd=2.22), 
* sample[1]= normal(n=4562 m=[0.72, 0.68] sd=1.25), normal(n=2992 m=[0.48, 0.52] sd=0.58),
normal(n=1022 m=[0.67, 0.31] sd=0.53), normal(n=1227 m=[1.17, 1.41] sd=2.13), 
* sample[2]= normal(n=4377 m=[0.66, 0.61] sd=1.08), normal(n=2592 m=[0.28, 0.71] sd=0.51),
normal(n=1057 m=[1.04, -0.06] sd=0.25), normal(n=1831 m=[1.15, 1.26] sd=2.05), 
* sample[3]= normal(n=4302 m=[0.74, 0.36] sd=0.80), normal(n=2075 m=[-0.00, 1.01] sd=0.32),
normal(n=793 m=[1.04, -0.05] sd=0.20), normal(n=2694 m=[1.04, 1.17] sd=1.93), 
* sample[4]= normal(n=3602 m=[0.80, 0.21] sd=0.58), normal(n=1923 m=[-0.05, 1.05] sd=0.26),
normal(n=621 m=[1.03, -0.06] sd=0.19), normal(n=3677 m=[0.94, 1.09] sd=1.77), 


> dirichlet process implementation
> --------------------------------
>
>                 Key: MAHOUT-30
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-30
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Isabel Drost
>            Assignee: Jeff Eastman
>         Attachments: MAHOUT-30.patch, MAHOUT-30b.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model. The implementation
is only slightly more difficult and the result is a (nearly)
> > non-parametric clustering algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message