madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aditya Nain <adityana...@gmail.com>
Subject Re: Contributing GMM and Perceptron to MADLib
Date Mon, 28 Mar 2016 22:31:00 GMT
Hi Rahul,

Thanks for the reply!

I am working on implementing Gaussian Mixture Model assuming that the
co-variance matrix is same for all the Gaussians.
The JIRA which deals GMM is MADBLIB-410:
https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB

Can this be assigned to me, or how do I get it assigned to me?

Thanks,
Aditya

On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <riyer@pivotal.io> wrote:

> Hi Aditya,
>
> Welcome to the MADlib community!
>
> Gaussian Mixture models is extrememly useful and we would heartily welcome
> a contribution for it. The SQLEM paper might be oversimplifying the
> capabilities of the database (e.g. assuming there is no array type is
> unnecessary for Postgresql). You could speed things (both dev time and
> execution time) by writing some of the functions in C++. K-means is an
> example of how clustering is implemented.
> IMO, assuming the same covariance matrix is reasonable. We could extend the
> capabilities after the initial implementation is complete.
>
> There was some work started a long time ago that built perceptrons using
> the convex framework (link <https://github.com/iyerr3/madlib/tree/mlp>).
> There are still some bugs in that code since the trained network isn't
> converging. You could start there or build a new module - either ways an
> MLP module is frequently demanded by the data science community.
>
> I would suggest starting with Gaussian mixtures and then moving to
> perceptrons if GMM work is completed.
>
> Feel free to ask questions on this forum. Looking forward to collaborating
> with you.
>
> Best,
> Rahul
>
> On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain <adityanain1@gmail.com>
> wrote:
>
> > Hi,
> >
> > My name is Aditya Nain, and I am a graduate student at University of
> > Florida.
> > I have been learning MADLib for a while and want to contribute to MADLib.
> > I went through some of the open stories in JIRA and started working on
> > MADLIB-410  :
> >
> >
> https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB
> >
> > which is about implementing Gaussian Mixture Model using Expectation
> > Maximization (EM) algorithm.
> >
> > I came across the following paper while searching for distributed EM
> > algorithm which can be implemented in MADLib.
> >
> > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL using the
> EM
> > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 Pages 559-570.
> > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564
> >
> > I thought of implementing the approach discussed in the paper, but the
> > paper makes an assumption that the covariance martix is the same for all
> > the clusters ( i.e covariance matrix is same for all the Gaussian
> > distributions). So, I wanted to know the opinion of the community if it's
> > fine to go with the assumption made in the paper and implement it in
> > MADLib.
> >
> > Also, currently MADLib doesn't have an implementation of a perceptron,
> nor
> > did I find any open story related to it in JIRA. I came across the
> > following paper, which talks about a distributed algorithm for
> perceptron :
> >
> > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training strategies
> for
> > the structured perceptron"
> > http://dl.acm.org/citation.cfm?id=1858068
> >
> > Would it useful to have a distributed implementaion of perceptron in
> > MADlib?
> >
> > Thanks,
> > Aditya
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message