[ https://issues.apache.org/jira/browse/MAHOUT4?page=com.atlassian.jira.plugin.system.issuetabpanels:commenttabpanel&focusedCommentId=12586287#action_12586287
]
Isabel Drost commented on MAHOUT4:

Adding the comments sent to the list here as well for further reference.
> 
> So here is a short writeup in my words, please feel free to
> fill any gaps/errors found
I will try to do so from my perspective, maybe others can add their views.
> Expectation Maximization for clustering
> 
> Let
> z = unobserved data, clusters in our case.
> y = observed data, points in our case.
>
> p(y1z1) + p(y2z1) + p(y3z1) + p(y4z1) = 1
> p(z1) + p(z2) = 1
Looks correct to me.
> EStep.
> 
> MStep
> 
I could not find an error in neither of the two steps so far.
> Questions
> =========
> 1. When and how do we recompute the cluster centers ?
EM does not work with explicit cluster centers. In kmeans you iterate two
steps: Assigning points to centers and recomputing the centers. In EM you
again iterate two steps: Computing the probabilities for each point belonging
to the clusters (so you do not assign them hard to one cluster, you only say
with probability P it belongs to clusters i to k), in the second step you
recompute the parameters of each cluster  the cluster center is influenced
by each point but only weighted by its probability of belonging to this
cluster.
> 2. As per my understanding points and clusters are simply labels with some
> conditional probability assigned to them. A distance metric like one
> used in Kmeans is nowhere involved, is that correct ?
Yes and no: Technically no, conceptually, your computation for the probability
of assigning a point to a cluster should be based on the point's distance to
the cluster.
I hope I did not cause more confusion than helping you. Maybe others can
correct me or clarify what I left unclear...
Isabel
> Simple prototype for Expectation Maximization (EM)
> 
>
> Key: MAHOUT4
> URL: https://issues.apache.org/jira/browse/MAHOUT4
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ankur
> Attachments: Mahout_EM.patch
>
>
> Create a simple prototype implementing Expectation Maximization  EM that demonstrates
the algorithm functionality given a set of (user, clickurl) data.
> The prototype should be functionally complete and should serve as a basis for the MapReduce
version of the EM algorithm.

This message is automatically generated by JIRA.

You can reply to this email to add a comment to the issue online.
