mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Isabel Drost (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAHOUT-4) Simple prototype for Expectation Maximization (EM)
Date Mon, 10 Mar 2008 07:13:46 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-4?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576885#action_12576885
] 

Isabel Drost commented on MAHOUT-4:
-----------------------------------

Your plan of first trying to understand the non-distributed version and then map-reducing
the algorithm sounds great :) Some comments from my point of view:

Maybe you might want to chose more verbose variable names than u, s and z and provide the
mapping to the names used in the paper in a comment. Should make it easier for the reader
of your code to distinguish users, stories and clusters (z).

I think you might want to inline() the initialize method. For me personally this makes it
easier to follow what is done in the constructors. As for the default constructor, you could
simply delegate initialization to PLSI_engine(u, s, z) by giving the default values for initialization.

Concerning the method calculate P_z_u_s - how many cluster numbers do you expect? It seems
like this computation could become numerically unstable in case of very large numbers of clusters.

It would be nice if you could provide some unit tests to prove that your code is working correctly.

I know EM as a rather general principle - your implementation seems rather focussed on the
setup of the google news clustering solution. I was wondering, whether it would be possible
to generalize the implementation a little but still support the new personalization use case?
Maybe others would like to reuse a general EM framework but not the exact same formulas that
you used. Don't know whether that is possible and whether it can be done in a way that is
easy to read....

> Simple prototype for Expectation Maximization (EM)
> --------------------------------------------------
>
>                 Key: MAHOUT-4
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-4
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ankur
>         Attachments: Mahout_EM.patch
>
>
> Create a simple prototype implementing Expectation Maximization - EM that demonstrates
the algorithm functionality given a set of (user, click-url) data.
> The prototype should be functionally complete and should serve as a basis for the Map-Reduce
version of the EM algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message