Return-Path: Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: (qmail 38483 invoked from network); 5 Sep 2010 20:12:39 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Sep 2010 20:12:39 -0000 Received: (qmail 53044 invoked by uid 500); 5 Sep 2010 20:12:39 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 52960 invoked by uid 500); 5 Sep 2010 20:12:38 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 52934 invoked by uid 99); 5 Sep 2010 20:12:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Sep 2010 20:12:38 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dlieu.7@gmail.com designates 209.85.216.177 as permitted sender) Received: from [209.85.216.177] (HELO mail-qy0-f177.google.com) (209.85.216.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Sep 2010 20:12:32 +0000 Received: by qyk8 with SMTP id 8so5612511qyk.1 for ; Sun, 05 Sep 2010 13:12:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=DWcw70+fDYHBVAf2SNOJPVxtcpYJ3kY58Jzow7AbtOA=; b=dUpSQFvjXtFKbK2EJ/E93VHziRhOuPvzB11yXV+nAmDKb/oX6ouwTLoDBNNjnFQRZh jZj/MDXzYOwzrDrQCI8YJ7qPcXDDIYMbuEpazJblc4Zahg+LKBq8tLwKJscFgAdk5a+W Cb8zD6roc5CcUE1CnRQ2uO/2Po4DFmeMAJvoA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=LXo5fOpYfcgC3j690Q1HhhapAolt6z2k7FJLls2iH8ir1Z0cuGZRxefp/eTVZ9Ysga 1mXpIdXLUznAOL5ONecgJPoCQD9l87NaiMNL4lH1s+Vz6lIQKs6w2LebcwSMT5K0vMDe Rduv0WmyMUE60UbseECSyw7K3A7o8nmvQIiSA= MIME-Version: 1.0 Received: by 10.229.131.143 with SMTP id x15mr2256037qcs.198.1283717530976; Sun, 05 Sep 2010 13:12:10 -0700 (PDT) Received: by 10.229.190.69 with HTTP; Sun, 5 Sep 2010 13:12:10 -0700 (PDT) In-Reply-To: References: Date: Sun, 5 Sep 2010 13:12:10 -0700 Message-ID: Subject: Re: Regression+SGD question From: Dmitriy Lyubimov To: dev@mahout.apache.org Content-Type: multipart/alternative; boundary=0015175747ec6d02ab048f88c770 --0015175747ec6d02ab048f88c770 Content-Type: text/plain; charset=ISO-8859-1 Ted, thank you very much. I would like to discuss one more generalization here if i may. Let's consider Netflix prize problem for the moment. That is, parameters of regression are non-quantitative ones (person, movie ids essentually). Regressand is the user's score. I guess many familiar with Yehuda Koren's approach to this when he basically used SGD as non-negative factorization and he also mentioned something about applying logistics function on top of it. I.e. the regression looks exactly like it would for logistic regression (he also added biases), with exception that it is more of a nonnegative one (factors are not allowed to do negative). The problem i currently have on my hands is a hybrid of those. I.e. imagine that in addition to some non-quantitative features (person, movie) you know some quantitative features about movie (say genre scores that come out of some sort of encyclopedic database, i.e. manually trained taxonomy) (you might also know some quantitative features about person too, but let's keep it simple for the purpose of this discussion). It's very easy for me to go in and create individual regression for a user based on their reaction (like /didn't like) and what i know of quantitative qualities of movies. However, at some point i start feeling like movie genre ratings are not enough. Some movies have still some pretty unique factors about them that we don't really know or rated as a feature. So what i really want is probably nonnegative factorization but the one that takes into account quantitative features that come from different aspects of a given instance of (person, movie) interaction . (movie genre, time of day, weather outside, etc., whatever we think may have a good chance to be a good feature without really going thru a PCA or feature selection process at the moment). So encountering quantitative features we may search for regression parameters, but for non-quatitative features (person, movie) i'd still prefer to have non-negative biggest factors learned based on history. Is there's a way to merge both those approaches into one, as they seem to be really similar? (i.e. regressions with non-negative factorization)? Intuitively i feel that those approaches are really similar (difference is in NNF we are really guessing the principal factors input, essentially). And there must be a relatively simple way to morph it all in a hybrid approach where some of betas interact with quantitative features x but yet another ones interact with non-negative factors associated with non-quantitative input (such as person id) encountered in the sample. Does it make sense? is there a way to do this in Mahout? Thank you very much. -Dmitriy. On Sat, Sep 4, 2010 at 3:05 PM, Ted Dunning wrote: > I generally add in the constant term to the feature vector if I want to use > it. You are correct that it is usually critical to correct function, but I > prefer to not have a special case for it. The one place where I think that > is wrong is where you want to have special treatment by the prior. It is > common to have a very different prior on the intercept than on the > coefficients. My only defense there is that common priors for the > coefficients like L1 allow for plenty of latitude on the intercept so that > as long as the data outweigh the prior, this doesn't matter. There is a > similar distinctive effect between interactions and main effects. > > One place it would matter a lot is in multi-level inference where you wind > up with a pretty strong prior from the higher level regressions (since that > is where most of the data actually is). In that case, I would probably > rather separate the handling. In fact, at that point, I think I would > probably go with a grouped prior to allow handling all of these cases in a > coherent setting. > > On the second question, betas can definitely go negative. That is how the > model expresses an effect that decreases the likelihood of success. > > On Sat, Sep 4, 2010 at 1:28 PM, Dmitriy Lyubimov > wrote: > > > There's something i don't understand about your derivation . > > > > > > > > I think Bishop generally suggests that in linear regression y=beta_0 + > > (so there's an intercept) > > and i think he uses similar approach with fitting to logistic function > > where > > i think he suggests to use P( [mu + ]/s ) > > which of course can be thought of again as P(beta_0+) > > > > but if there's no intercept beta_0, then y(x=(0,...0)^T | beta) is > always > > 0. Which is not true of course in most situations. Does your method imply > > that having trivial input (all 0s ) would produce 0 estimation? > > > > Second question, are the betas allowed to go negative? > > > --0015175747ec6d02ab048f88c770--