mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Recommeding on Dynamic Content
Date Thu, 03 Feb 2011 04:54:08 GMT
I am basically retracing generalization of the Bayesian inference
problem given in Yahoo paper. I am too lazy to go back for a quote.

 The SVD problem was discussed at meetups, basically the criticism
here is that for RxC matrix whenever there's a missing measurement,
one can't specify 'no measurement' but rather have to leave it at some
neutral value (0? average?) which is essentially nothing but a noise
since it's not a sample. As one guy from Stanford demonstrated on
Netflix data, the whole system collapses very quickly after certain
threshold of sample sparsity is reached.

On Wed, Feb 2, 2011 at 7:20 PM, Ted Dunning <> wrote:
> Dmitriy,
> I am not clear what you are saying entirely, but as far as I can understand
> your points, I think I disagree.  Of course, if I don't catch your drift, I
> might be wrong and we might be in agreement.
> On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <> wrote:
>> both Elkan's work and Yahoo's paper are based on the notion (which is
>> confirmed by SGD experience) that if we try to substitute missing data with
>> neutral values, the whole learning falls apart. Sort of.
> I don't see why you say that.  Elkan and Yahoo want to avoid the cold start
> process by using user and item offsets and by using latent factors to smooth
> the recommendation process.
>> I.e. if we always know some context A (in this case, static labels and
>> dyadic ids) and only sometimes some context B, then assuming neutral values
>> for context B if we are missing this data is invalid because we are actually
>> substituting unknown data with made-up data.
> This is abstract that I don't know what you are referring to really.  Yes,
> static characteristics will be used if they are available and latent factors
> will be used if they are available.
>> Which is why SGD produces higher errors than necessary on sparsified label
>> data. this is also the reason why SVD recommenders produce higher errors
>> over sparse sample data as well (i think that's  the consensus).
> I don't think I am part of that consensus.
> SGD produces very low errors when used with sparse data.  But it can also
> use non-sparse features just as well.  Why do you mean "higher errors than
> necessary"?  That lower error rates are possible with latent factor
> techniques?
>> However, thinking in offline-ish mode, if we learn based on samples with A
>> data, then freeze the learner and learn based on error between frozen
>> learner for A and only the input that has context B, for learner B, then we
>> are not making the mistake per above. At no point our learner takes any
>> 'made-up' data.
> Are you talking about the alternating learning process in Menon and Elkan?
>> This whole notion is based on Bayesian inference process: what can you say
>> if you only know A; and what correction would you make if you also new B.
> ?!??
> The process is roughly analogous to an EM algorithm, but not very.
>> Both papers do a corner case out of this: we have two types of data, A and
>> B, and we learn A then freeze leaner A, then learn B where available.
>> But general case doesn't have to be A and B. Actually that's our case (our
>> CEO calls it 'trunk-brunch-leaf' case): We always know some context A, and
>> sometimes B, and also sometimes we know all of A, B and some addiional
>> context C.
>> so there's a case to be made to generalize the inference architecture:
>> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever else.
> I think that these analogies are very strained.

View raw message