mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Recommeding on Dynamic Content
Date Thu, 03 Feb 2011 03:20:57 GMT

I am not clear what you are saying entirely, but as far as I can understand
your points, I think I disagree.  Of course, if I don't catch your drift, I
might be wrong and we might be in agreement.

On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <> wrote:

> both Elkan's work and Yahoo's paper are based on the notion (which is
> confirmed by SGD experience) that if we try to substitute missing data with
> neutral values, the whole learning falls apart. Sort of.

I don't see why you say that.  Elkan and Yahoo want to avoid the cold start
process by using user and item offsets and by using latent factors to smooth
the recommendation process.

> I.e. if we always know some context A (in this case, static labels and
> dyadic ids) and only sometimes some context B, then assuming neutral values
> for context B if we are missing this data is invalid because we are actually
> substituting unknown data with made-up data.

This is abstract that I don't know what you are referring to really.  Yes,
static characteristics will be used if they are available and latent factors
will be used if they are available.

> Which is why SGD produces higher errors than necessary on sparsified label
> data. this is also the reason why SVD recommenders produce higher errors
> over sparse sample data as well (i think that's  the consensus).

I don't think I am part of that consensus.

SGD produces very low errors when used with sparse data.  But it can also
use non-sparse features just as well.  Why do you mean "higher errors than
necessary"?  That lower error rates are possible with latent factor

> However, thinking in offline-ish mode, if we learn based on samples with A
> data, then freeze the learner and learn based on error between frozen
> learner for A and only the input that has context B, for learner B, then we
> are not making the mistake per above. At no point our learner takes any
> 'made-up' data.

Are you talking about the alternating learning process in Menon and Elkan?

> This whole notion is based on Bayesian inference process: what can you say
> if you only know A; and what correction would you make if you also new B.


The process is roughly analogous to an EM algorithm, but not very.

> Both papers do a corner case out of this: we have two types of data, A and
> B, and we learn A then freeze leaner A, then learn B where available.
> But general case doesn't have to be A and B. Actually that's our case (our
> CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and
> sometimes B, and also sometimes we know all of A, B and some addiional
> context C.
> so there's a case to be made to generalize the inference architecture:
> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.

I think that these analogies are very strained.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message