Return-Path: X-Original-To: apmail-madlib-dev-archive@minotaur.apache.org Delivered-To: apmail-madlib-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E234F18413 for ; Mon, 28 Mar 2016 22:50:04 +0000 (UTC) Received: (qmail 62383 invoked by uid 500); 28 Mar 2016 22:50:04 -0000 Delivered-To: apmail-madlib-dev-archive@madlib.apache.org Received: (qmail 62342 invoked by uid 500); 28 Mar 2016 22:50:04 -0000 Mailing-List: contact dev-help@madlib.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@madlib.incubator.apache.org Delivered-To: mailing list dev@madlib.incubator.apache.org Received: (qmail 62330 invoked by uid 99); 28 Mar 2016 22:50:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Mar 2016 22:50:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id CF949C0B54 for ; Mon, 28 Mar 2016 22:50:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.448 X-Spam-Level: * X-Spam-Status: No, score=1.448 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id bnnv8sU9g3Oi for ; Mon, 28 Mar 2016 22:50:01 +0000 (UTC) Received: from mail-io0-f170.google.com (mail-io0-f170.google.com [209.85.223.170]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 25A405F3A1 for ; Mon, 28 Mar 2016 22:50:01 +0000 (UTC) Received: by mail-io0-f170.google.com with SMTP id g185so1625185ioa.2 for ; Mon, 28 Mar 2016 15:50:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=67CNXtGfmfUvlQ4Pad/3vynOWSKXZtqlGuKJ4BxpvW8=; b=bGylaKHFDUbXK2tUrXfWQEuDaJjeF5NPaiG1bujaBv1TDrDfxP9o1Fg8kywkujWohF cuVcwiP0tgURIrzSlZPkhf9CivCRib1D13mO017anwdhqUrm6wY7V1PONEJj9NIRbgti MG4pdxjPtkSpKikZbAjtRyUG5hsMtcEB+h+pQv1fbyMrkwsKUjNR2Dt5dZ2zlyrkpT7C WTN3LEVA03FI/lFMh+7o0lc9FLD7QE6ayr3AO+901L1IhY3uyNyMKvP9B0VXFyX/NCJG 0g2GWUoez7QYLKLjRDAdWzprL53LQ69DYXK5nkkB5KcqoQkclmLpdrYerFYVW9oSvqVy xUKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=67CNXtGfmfUvlQ4Pad/3vynOWSKXZtqlGuKJ4BxpvW8=; b=MI2N0LvgAkwD0Vq3hoNFStCXbyc51miM4ljfiNCcn/B8Zmi9Q/LY861XoilyvWO1JZ z+WMOfylxVGK8Mnhap0LShxi0szvMke3DhdEUKUeVOrvUPVVZ7Wy0hZjoiWgSaoxmaCf 7+4owhCs4mSA10p3B0ErIZ+xRR7YTfBXlo89eSYKQ4mit7ErMnU4urEP9UV+iVOrXsrP /oo+fbSo7VN0YSFEkgSALMDQBhJWWLouKQEITJjRrk96qtAZceWI9OnM0UB7L/9jLq/q 4gF01yItKByKo/NjDgwGg1whbjCS2MOv8+iIexubh09+WAETW5xLUg4YpS37sd6l1t7U RIsQ== X-Gm-Message-State: AD7BkJIqgh4srMwRXvl/YQlzGApkmR+7h9NlIdf+42t/IVquO5OttpDxkyc2bE+lCcgbazUgSQKI4IaMVZI4Nw== X-Received: by 10.107.137.16 with SMTP id l16mr6445202iod.197.1459205400298; Mon, 28 Mar 2016 15:50:00 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.189.1 with HTTP; Mon, 28 Mar 2016 15:49:40 -0700 (PDT) In-Reply-To: References: From: Aditya Nain Date: Mon, 28 Mar 2016 18:49:40 -0400 Message-ID: Subject: Re: Contributing GMM and Perceptron to MADLib To: dev@madlib.incubator.apache.org Content-Type: multipart/alternative; boundary=001a113ec6ca8a0b90052f23bb18 --001a113ec6ca8a0b90052f23bb18 Content-Type: text/plain; charset=UTF-8 Hi Rahul, I didn't have an id, so I created one now. My id is : Aditya Nain Thanks, Aditya On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer wrote: > I can assign this to you, but you need to have an account in > https://issues.apache.org. > If you already have an account, then please send your id - I wasn't able to > find you just using your name. > > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain > wrote: > > > Hi Rahul, > > > > Thanks for the reply! > > > > I am working on implementing Gaussian Mixture Model assuming that the > > co-variance matrix is same for all the Gaussians. > > The JIRA which deals GMM is MADBLIB-410: > > > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB > > > > Can this be assigned to me, or how do I get it assigned to me? > > > > Thanks, > > Aditya > > > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer wrote: > > > > > Hi Aditya, > > > > > > Welcome to the MADlib community! > > > > > > Gaussian Mixture models is extrememly useful and we would heartily > > welcome > > > a contribution for it. The SQLEM paper might be oversimplifying the > > > capabilities of the database (e.g. assuming there is no array type is > > > unnecessary for Postgresql). You could speed things (both dev time and > > > execution time) by writing some of the functions in C++. K-means is an > > > example of how clustering is implemented. > > > IMO, assuming the same covariance matrix is reasonable. We could extend > > the > > > capabilities after the initial implementation is complete. > > > > > > There was some work started a long time ago that built perceptrons > using > > > the convex framework (link >). > > > There are still some bugs in that code since the trained network isn't > > > converging. You could start there or build a new module - either ways > an > > > MLP module is frequently demanded by the data science community. > > > > > > I would suggest starting with Gaussian mixtures and then moving to > > > perceptrons if GMM work is completed. > > > > > > Feel free to ask questions on this forum. Looking forward to > > collaborating > > > with you. > > > > > > Best, > > > Rahul > > > > > > On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain > > > wrote: > > > > > > > Hi, > > > > > > > > My name is Aditya Nain, and I am a graduate student at University of > > > > Florida. > > > > I have been learning MADLib for a while and want to contribute to > > MADLib. > > > > I went through some of the open stories in JIRA and started working > on > > > > MADLIB-410 : > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB > > > > > > > > which is about implementing Gaussian Mixture Model using Expectation > > > > Maximization (EM) algorithm. > > > > > > > > I came across the following paper while searching for distributed EM > > > > algorithm which can be implemented in MADLib. > > > > > > > > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL using > the > > > EM > > > > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 Pages > > 559-570. > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564 > > > > > > > > I thought of implementing the approach discussed in the paper, but > the > > > > paper makes an assumption that the covariance martix is the same for > > all > > > > the clusters ( i.e covariance matrix is same for all the Gaussian > > > > distributions). So, I wanted to know the opinion of the community if > > it's > > > > fine to go with the assumption made in the paper and implement it in > > > > MADLib. > > > > > > > > Also, currently MADLib doesn't have an implementation of a > perceptron, > > > nor > > > > did I find any open story related to it in JIRA. I came across the > > > > following paper, which talks about a distributed algorithm for > > > perceptron : > > > > > > > > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training > strategies > > > for > > > > the structured perceptron" > > > > http://dl.acm.org/citation.cfm?id=1858068 > > > > > > > > Would it useful to have a distributed implementaion of perceptron in > > > > MADlib? > > > > > > > > Thanks, > > > > Aditya > > > > > > > > > > --001a113ec6ca8a0b90052f23bb18--