mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Regression+SGD question
Date Sun, 05 Sep 2010 20:12:10 GMT
Ted, thank you very much.

I would like to discuss one more generalization here if i may.

Let's consider Netflix prize problem for the moment. That is, parameters of
regression are non-quantitative ones (person, movie ids essentually).
Regressand is the user's score. I guess many familiar with Yehuda Koren's
approach to this when he basically used SGD as non-negative factorization
and he also mentioned something about applying logistics function on top of
it. I.e. the regression looks exactly like it would for logistic regression
(he also added biases), with exception that it is more of a nonnegative one
(factors are not allowed to do negative).

The problem i currently have on my hands is a hybrid of those. I.e. imagine
that in addition to some non-quantitative features (person, movie) you know
some quantitative features about movie (say genre scores that come out of
some sort of encyclopedic database, i.e. manually trained taxonomy) (you
might also know some quantitative features about person too, but let's keep
it simple for the purpose of this discussion).

It's very easy for me to go in and create individual regression for a user
based on their reaction (like /didn't like) and what i know of quantitative
qualities of movies.

However, at some point i start feeling like movie genre ratings are not
enough. Some movies have still some pretty unique factors about them that we
don't really know or rated as a feature.

So what i really want is probably nonnegative factorization but the one that
takes into account quantitative features that come from different aspects of
a given instance of (person, movie) interaction . (movie genre, time of day,
weather outside, etc., whatever we think may have a good chance to be a good
feature without really going thru a PCA or feature selection process at the
So encountering quantitative features we may search for regression
parameters, but for non-quatitative features (person, movie) i'd still
prefer to have non-negative biggest factors learned based on history.

Is there's a way to merge both those approaches into one, as they seem to be
really similar? (i.e. regressions with non-negative factorization)?

Intuitively i feel that those approaches are really similar (difference is
in NNF we are really guessing the principal factors input, essentially).
 And there must be a relatively simple way to morph it all in a hybrid
approach where some of betas interact with quantitative features x but yet
another ones interact with non-negative factors associated with
non-quantitative input (such as person id) encountered in the sample.

Does it make sense? is there a way to do this in Mahout?

Thank you very much.

On Sat, Sep 4, 2010 at 3:05 PM, Ted Dunning <> wrote:

> I generally add in the constant term to the feature vector if I want to use
> it.  You are correct that it is usually critical to correct function, but I
> prefer to not have a special case for it.  The one place where I think that
> is wrong is where you want to have special treatment by the prior.  It is
> common to have a very different prior on the intercept than on the
> coefficients.  My only defense there is that common priors for the
> coefficients like L1 allow for plenty of latitude on the intercept so that
> as long as the data outweigh the prior, this doesn't matter.  There is a
> similar distinctive effect between interactions and main effects.
> One place it would matter a lot is in multi-level inference where you wind
> up with a pretty strong prior from the higher level regressions (since that
> is where most of the data actually is).  In that case, I would probably
> rather separate the handling.  In fact, at that point, I think I would
> probably go with a grouped prior to allow handling all of these cases in a
> coherent setting.
> On the second question, betas can definitely go negative.  That is how the
> model expresses an effect that decreases the likelihood of success.
> On Sat, Sep 4, 2010 at 1:28 PM, Dmitriy Lyubimov <>
> wrote:
> > There's something i don't understand about your derivation .
> >
> >
> >
> > I think Bishop  generally suggests that in linear regression y=beta_0 +
> > <beta, x> (so there's an intercept)
> > and i think he uses similar approach with fitting to logistic function
> > where
> > i think he suggests to use P( [mu + <beta,x>]/s )
> > which of course can be thought of again as P(beta_0+<beta,x>)
> >
> > but if there's no intercept beta_0, then y(x=(0,...0)^T | beta)  is
> always
> > 0. Which is not true of course in most situations. Does your method imply
> > that having trivial input (all 0s ) would produce 0 estimation?
> >
> > Second question, are the betas allowed to go negative?
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message