Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C3C7110BC7 for ; Wed, 31 Jul 2013 18:49:57 +0000 (UTC) Received: (qmail 87964 invoked by uid 500); 31 Jul 2013 18:49:56 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 87405 invoked by uid 500); 31 Jul 2013 18:49:51 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 87357 invoked by uid 99); 31 Jul 2013 18:49:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 31 Jul 2013 18:49:49 +0000 Date: Wed, 31 Jul 2013 18:49:49 +0000 (UTC) From: "Kun Yang (JIRA)" To: dev@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAHOUT-1273) Single Pass Algorithm for Penalized Linear Regression with Cross Validation on MapReduce MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kun Yang updated MAHOUT-1273: ----------------------------- Attachment: java files.pdf Manual and Example.pdf Algorithm and Numeric Stability.pdf > Single Pass Algorithm for Penalized Linear Regression with Cross Validation on MapReduce > ---------------------------------------------------------------------------------------- > > Key: MAHOUT-1273 > URL: https://issues.apache.org/jira/browse/MAHOUT-1273 > Project: Mahout > Issue Type: New Feature > Reporter: Kun Yang > Attachments: Algorithm and Numeric Stability.pdf, java files.pdf, Manual and Example.pdf, PenalizedLinear.pdf > > Original Estimate: 720h > Remaining Estimate: 720h > > Penalized linear regression such as Lasso, Elastic-net are widely used in machine learning, but there are no very efficient scalable implementations on MapReduce. > The published distributed algorithms for solving this problem is either iterative (which is not good for MapReduce, see Steven Boyd's paper) or approximate (what if we need exact solutions, see Paralleled stochastic gradient descent); another disadvantage of these algorithms is that they can not do cross validation in the training phase, which requires a user-specified penalty parameter in advance. > My ideas can train the model with cross validation in a single pass. They are based on some simple observations. > The core algorithm is a modified version of coordinate descent (see J. Freedman's paper). They implemented a very efficient R package "glmnet", which is the de facto standard of penalized regression. > I have implemented the primitive version of this algorithm in Alpine Data Labs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira