Return-Path: X-Original-To: apmail-madlib-dev-archive@minotaur.apache.org Delivered-To: apmail-madlib-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 71A9718E7F for ; Thu, 17 Mar 2016 21:09:20 +0000 (UTC) Received: (qmail 18240 invoked by uid 500); 17 Mar 2016 21:09:20 -0000 Delivered-To: apmail-madlib-dev-archive@madlib.apache.org Received: (qmail 18205 invoked by uid 500); 17 Mar 2016 21:09:20 -0000 Mailing-List: contact dev-help@madlib.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@madlib.incubator.apache.org Delivered-To: mailing list dev@madlib.incubator.apache.org Received: (qmail 18182 invoked by uid 99); 17 Mar 2016 21:09:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2016 21:09:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 9BE98C33F3 for ; Thu, 17 Mar 2016 21:09:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.429 X-Spam-Level: * X-Spam-Status: No, score=1.429 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id lUTflGw1kZGi for ; Thu, 17 Mar 2016 21:09:17 +0000 (UTC) Received: from mail-vk0-f46.google.com (mail-vk0-f46.google.com [209.85.213.46]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 001E95F23D for ; Thu, 17 Mar 2016 21:09:16 +0000 (UTC) Received: by mail-vk0-f46.google.com with SMTP id q138so26453579vkb.3 for ; Thu, 17 Mar 2016 14:09:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=KQGX9Aaq3wPgzqluRxzDTZj6bUsKGwXRA/CLqX06hY0=; b=dZBC5e+mndFEIOG/baLfzYvCg5krM66VFSVFPYksZwCEEvVwyY2P8tFLwko72tLQno ycZ/+bcM8G8Vkoa3y4GK3KOIxzGDZO5+xrCHVChxipgDHM076p3Dgs9LN52m3FHm0Uc+ vFApLfR22hgwYDqSxEXjQp8NZZaud1CHLG4UZ4wqzRK1cEZ506wlfIZl2t3euIH6Wvn1 mlUI82JoCucdwFknp0tTkMozqIiyBPWq/GO9dTV9vw/wQFy9ssxk38t4R9Gr6GuYb85P 898mgOPgOXp9xodAMDAo+64P4Lh9rhBlymMTSk4B84otraEFufTKQMFetNY92ZFyety2 CqRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=KQGX9Aaq3wPgzqluRxzDTZj6bUsKGwXRA/CLqX06hY0=; b=YHYuHPa5CP2ml7dKlgQD3CjUn86nhcCs0eFaxckFH2VMeLnrAFUH3Vfpyrzf14HRfm lYVWUz51wu3zTDI481HPQApMrod6kCWnEaxwKzqoNMcBLJmpJunNjCTM69f1gwgOD3yj 6XPcPee3YSYYh8zqbRSBHCUh0ryrqYmcnRzMcnI3aKJz+TbX/G8alMZ2kq3509oUFKr+ f9LGZcdIF6IR6hPFH3XiE8xWnfKmDIAqDVRbA4NwPx4JKvhrbSQu64uVrid7GNXZCKwi rRdzxDqvYpKNE2DcHPMAzm+mWzQH0TnmCvjkjf0KKl3oovz565W9wIGK02h1vjYzz1eK GqpQ== X-Gm-Message-State: AD7BkJJnDBWRjuACrI3B5Yl/ywaDBvOLmm7kRtErLZEPHGyV3dD+FcguEKOfZnpjGOXZYn7HTxygwA5o8QQrwQ== X-Received: by 10.31.56.151 with SMTP id f145mr14001166vka.107.1458248955893; Thu, 17 Mar 2016 14:09:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.52.204 with HTTP; Thu, 17 Mar 2016 14:08:56 -0700 (PDT) From: Aditya Nain Date: Thu, 17 Mar 2016 17:08:56 -0400 Message-ID: Subject: Contributing GMM and Perceptron to MADLib To: dev@madlib.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1143f948029d20052e450b22 --001a1143f948029d20052e450b22 Content-Type: text/plain; charset=UTF-8 Hi, My name is Aditya Nain, and I am a graduate student at University of Florida. I have been learning MADLib for a while and want to contribute to MADLib. I went through some of the open stories in JIRA and started working on MADLIB-410 : https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB which is about implementing Gaussian Mixture Model using Expectation Maximization (EM) algorithm. I came across the following paper while searching for distributed EM algorithm which can be implemented in MADLib. Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL using the EM algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 Pages 559-570. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564 I thought of implementing the approach discussed in the paper, but the paper makes an assumption that the covariance martix is the same for all the clusters ( i.e covariance matrix is same for all the Gaussian distributions). So, I wanted to know the opinion of the community if it's fine to go with the assumption made in the paper and implement it in MADLib. Also, currently MADLib doesn't have an implementation of a perceptron, nor did I find any open story related to it in JIRA. I came across the following paper, which talks about a distributed algorithm for perceptron : Ryan McDonald, Keith Hall, Gideon Mann "Distributed training strategies for the structured perceptron" http://dl.acm.org/citation.cfm?id=1858068 Would it useful to have a distributed implementaion of perceptron in MADlib? Thanks, Aditya --001a1143f948029d20052e450b22--