I can assign this to you, but you need to have an account in
https://issues.apache.org.
If you already have an account, then please send your id - I wasn't able to
find you just using your name.
On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain wrote:
> Hi Rahul,
>
> Thanks for the reply!
>
> I am working on implementing Gaussian Mixture Model assuming that the
> co-variance matrix is same for all the Gaussians.
> The JIRA which deals GMM is MADBLIB-410:
> https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB
>
> Can this be assigned to me, or how do I get it assigned to me?
>
> Thanks,
> Aditya
>
> On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer wrote:
>
> > Hi Aditya,
> >
> > Welcome to the MADlib community!
> >
> > Gaussian Mixture models is extrememly useful and we would heartily
> welcome
> > a contribution for it. The SQLEM paper might be oversimplifying the
> > capabilities of the database (e.g. assuming there is no array type is
> > unnecessary for Postgresql). You could speed things (both dev time and
> > execution time) by writing some of the functions in C++. K-means is an
> > example of how clustering is implemented.
> > IMO, assuming the same covariance matrix is reasonable. We could extend
> the
> > capabilities after the initial implementation is complete.
> >
> > There was some work started a long time ago that built perceptrons using
> > the convex framework (link ).
> > There are still some bugs in that code since the trained network isn't
> > converging. You could start there or build a new module - either ways an
> > MLP module is frequently demanded by the data science community.
> >
> > I would suggest starting with Gaussian mixtures and then moving to
> > perceptrons if GMM work is completed.
> >
> > Feel free to ask questions on this forum. Looking forward to
> collaborating
> > with you.
> >
> > Best,
> > Rahul
> >
> > On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain
> > wrote:
> >
> > > Hi,
> > >
> > > My name is Aditya Nain, and I am a graduate student at University of
> > > Florida.
> > > I have been learning MADLib for a while and want to contribute to
> MADLib.
> > > I went through some of the open stories in JIRA and started working on
> > > MADLIB-410 :
> > >
> > >
> >
> https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB
> > >
> > > which is about implementing Gaussian Mixture Model using Expectation
> > > Maximization (EM) algorithm.
> > >
> > > I came across the following paper while searching for distributed EM
> > > algorithm which can be implemented in MADLib.
> > >
> > > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL using the
> > EM
> > > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 Pages
> 559-570.
> > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564
> > >
> > > I thought of implementing the approach discussed in the paper, but the
> > > paper makes an assumption that the covariance martix is the same for
> all
> > > the clusters ( i.e covariance matrix is same for all the Gaussian
> > > distributions). So, I wanted to know the opinion of the community if
> it's
> > > fine to go with the assumption made in the paper and implement it in
> > > MADLib.
> > >
> > > Also, currently MADLib doesn't have an implementation of a perceptron,
> > nor
> > > did I find any open story related to it in JIRA. I came across the
> > > following paper, which talks about a distributed algorithm for
> > perceptron :
> > >
> > > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training strategies
> > for
> > > the structured perceptron"
> > > http://dl.acm.org/citation.cfm?id=1858068
> > >
> > > Would it useful to have a distributed implementaion of perceptron in
> > > MADlib?
> > >
> > > Thanks,
> > > Aditya
> > >
> >
>