madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Shaposhnik <...@apache.org>
Subject Re: Contributing GMM and Perceptron to MADLib
Date Tue, 29 Mar 2016 04:12:47 GMT
I can help with that -- stay tuned.

On Mon, Mar 28, 2016 at 8:29 PM, Frank McQuillan <fmcquillan@pivotal.io> wrote:
> Let me figure out how to do this and add Aditya as the owner of that JIRA.
> My initial attempts in ASF infra-land were not quite successful.
>
> Frank
>
> On Mon, Mar 28, 2016 at 4:54 PM, Rahul Iyer <riyer@pivotal.io> wrote:
>>
>> @Frank, Roman: I believe Aditya needs to be added as a developer to the
>> MADlib project to assign a JIRA to him? Is this only available to the
>> lead/owner?
>>
>> On Mon, Mar 28, 2016 at 3:49 PM, Aditya Nain <adityanain1@gmail.com>
>> wrote:
>>>
>>> Hi Rahul,
>>>
>>> I didn't have an id, so I created one now.
>>> My id is : Aditya Nain
>>>
>>> Thanks,
>>> Aditya
>>>
>>> On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer <riyer@pivotal.io> wrote:
>>>
>>> > I can assign this to you, but you need to have an account in
>>> > https://issues.apache.org.
>>> > If you already have an account, then please send your id - I wasn't
>>> > able to
>>> > find you just using your name.
>>> >
>>> > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain <adityanain1@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi Rahul,
>>> > >
>>> > > Thanks for the reply!
>>> > >
>>> > > I am working on implementing Gaussian Mixture Model assuming that the
>>> > > co-variance matrix is same for all the Gaussians.
>>> > > The JIRA which deals GMM is MADBLIB-410:
>>> > >
>>> >
>>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB
>>> > >
>>> > > Can this be assigned to me, or how do I get it assigned to me?
>>> > >
>>> > > Thanks,
>>> > > Aditya
>>> > >
>>> > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <riyer@pivotal.io>
wrote:
>>> > >
>>> > > > Hi Aditya,
>>> > > >
>>> > > > Welcome to the MADlib community!
>>> > > >
>>> > > > Gaussian Mixture models is extrememly useful and we would heartily
>>> > > welcome
>>> > > > a contribution for it. The SQLEM paper might be oversimplifying
the
>>> > > > capabilities of the database (e.g. assuming there is no array
type
>>> > > > is
>>> > > > unnecessary for Postgresql). You could speed things (both dev
time
>>> > > > and
>>> > > > execution time) by writing some of the functions in C++. K-means
is
>>> > > > an
>>> > > > example of how clustering is implemented.
>>> > > > IMO, assuming the same covariance matrix is reasonable. We could
>>> > > > extend
>>> > > the
>>> > > > capabilities after the initial implementation is complete.
>>> > > >
>>> > > > There was some work started a long time ago that built perceptrons
>>> > using
>>> > > > the convex framework (link
>>> > > > <https://github.com/iyerr3/madlib/tree/mlp
>>> > >).
>>> > > > There are still some bugs in that code since the trained network
>>> > > > isn't
>>> > > > converging. You could start there or build a new module - either
>>> > > > ways
>>> > an
>>> > > > MLP module is frequently demanded by the data science community.
>>> > > >
>>> > > > I would suggest starting with Gaussian mixtures and then moving
to
>>> > > > perceptrons if GMM work is completed.
>>> > > >
>>> > > > Feel free to ask questions on this forum. Looking forward to
>>> > > collaborating
>>> > > > with you.
>>> > > >
>>> > > > Best,
>>> > > > Rahul
>>> > > >
>>> > > > On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain
>>> > > > <adityanain1@gmail.com>
>>> > > > wrote:
>>> > > >
>>> > > > > Hi,
>>> > > > >
>>> > > > > My name is Aditya Nain, and I am a graduate student at University
>>> > > > > of
>>> > > > > Florida.
>>> > > > > I have been learning MADLib for a while and want to contribute
to
>>> > > MADLib.
>>> > > > > I went through some of the open stories in JIRA and started
>>> > > > > working
>>> > on
>>> > > > > MADLIB-410  :
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB
>>> > > > >
>>> > > > > which is about implementing Gaussian Mixture Model using
>>> > > > > Expectation
>>> > > > > Maximization (EM) algorithm.
>>> > > > >
>>> > > > > I came across the following paper while searching for distributed
>>> > > > > EM
>>> > > > > algorithm which can be implemented in MADLib.
>>> > > > >
>>> > > > > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in
SQL
>>> > > > > using
>>> > the
>>> > > > EM
>>> > > > > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000
Pages
>>> > > 559-570.
>>> > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564
>>> > > > >
>>> > > > > I thought of implementing the approach discussed in the paper,
>>> > > > > but
>>> > the
>>> > > > > paper makes an assumption that the covariance martix is the
same
>>> > > > > for
>>> > > all
>>> > > > > the clusters ( i.e covariance matrix is same for all the
Gaussian
>>> > > > > distributions). So, I wanted to know the opinion of the community
>>> > > > > if
>>> > > it's
>>> > > > > fine to go with the assumption made in the paper and implement
it
>>> > > > > in
>>> > > > > MADLib.
>>> > > > >
>>> > > > > Also, currently MADLib doesn't have an implementation of
a
>>> > perceptron,
>>> > > > nor
>>> > > > > did I find any open story related to it in JIRA. I came across
>>> > > > > the
>>> > > > > following paper, which talks about a distributed algorithm
for
>>> > > > perceptron :
>>> > > > >
>>> > > > > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training
>>> > strategies
>>> > > > for
>>> > > > > the structured perceptron"
>>> > > > > http://dl.acm.org/citation.cfm?id=1858068
>>> > > > >
>>> > > > > Would it useful to have a distributed implementaion of perceptron
>>> > > > > in
>>> > > > > MADlib?
>>> > > > >
>>> > > > > Thanks,
>>> > > > > Aditya
>>> > > > >
>>> > > >
>>> > >
>>> >
>>
>>
>

Mime
View raw message