Yes, I was referring to Andrea Montanari. My apologies for "guy from
Stanford" reference. I wasn't aware of the paper but I was present at his
talk about his work, it was quite informative.
On Wed, Feb 2, 2011 at 11:49 PM, Federico Castanedo <fcastane@inf.uc3m.es>wrote:
> Hi all,
>
> Dimitry, I guess you are talking about this paper of Andrea Montanari, am i
> correct?
>
> Matrix Completion from Noisy Entries. http://arxiv.org/abs/0906.2027v1
>
> 2011/2/3 Dmitriy Lyubimov <dlieu.7@gmail.com>
>
> > I am basically retracing generalization of the Bayesian inference
> > problem given in Yahoo paper. I am too lazy to go back for a quote.
> >
> > The SVD problem was discussed at meetups, basically the criticism
> > here is that for RxC matrix whenever there's a missing measurement,
> > one can't specify 'no measurement' but rather have to leave it at some
> > neutral value (0? average?) which is essentially nothing but a noise
> > since it's not a sample. As one guy from Stanford demonstrated on
> > Netflix data, the whole system collapses very quickly after certain
> > threshold of sample sparsity is reached.
> >
> > On Wed, Feb 2, 2011 at 7:20 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> > > Dmitriy,
> > > I am not clear what you are saying entirely, but as far as I can
> > understand
> > > your points, I think I disagree. Of course, if I don't catch your
> drift,
> > I
> > > might be wrong and we might be in agreement.
> > >
> > > On Wed, Feb 2, 2011 at 2:43 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > wrote:
> > >>
> > >> both Elkan's work and Yahoo's paper are based on the notion (which is
> > >> confirmed by SGD experience) that if we try to substitute missing data
> > with
> > >> neutral values, the whole learning falls apart. Sort of.
> > >
> > > I don't see why you say that. Elkan and Yahoo want to avoid the cold
> > start
> > > process by using user and item offsets and by using latent factors to
> > smooth
> > > the recommendation process.
> > >
> > >>
> > >> I.e. if we always know some context A (in this case, static labels and
> > >> dyadic ids) and only sometimes some context B, then assuming neutral
> > values
> > >> for context B if we are missing this data is invalid because we are
> > actually
> > >> substituting unknown data with madeup data.
> > >
> > > This is abstract that I don't know what you are referring to really.
> > Yes,
> > > static characteristics will be used if they are available and latent
> > factors
> > > will be used if they are available.
> > >
> > >>
> > >> Which is why SGD produces higher errors than necessary on sparsified
> > label
> > >> data. this is also the reason why SVD recommenders produce higher
> errors
> > >> over sparse sample data as well (i think that's the consensus).
> > >
> > > I don't think I am part of that consensus.
> > > SGD produces very low errors when used with sparse data. But it can
> also
> > > use nonsparse features just as well. Why do you mean "higher errors
> > than
> > > necessary"? That lower error rates are possible with latent factor
> > > techniques?
> > >
> > >>
> > >> However, thinking in offlineish mode, if we learn based on samples
> with
> > A
> > >> data, then freeze the learner and learn based on error between frozen
> > >> learner for A and only the input that has context B, for learner B,
> then
> > we
> > >> are not making the mistake per above. At no point our learner takes
> any
> > >> 'madeup' data.
> > >
> > > Are you talking about the alternating learning process in Menon and
> > Elkan?
> > >
> > >>
> > >> This whole notion is based on Bayesian inference process: what can you
> > say
> > >> if you only know A; and what correction would you make if you also new
> > B.
> > >
> > > ?!??
> > > The process is roughly analogous to an EM algorithm, but not very.
> > >
> > >>
> > >> Both papers do a corner case out of this: we have two types of data, A
> > and
> > >> B, and we learn A then freeze leaner A, then learn B where available.
> > >>
> > >> But general case doesn't have to be A and B. Actually that's our case
> > (our
> > >> CEO calls it 'trunkbrunchleaf' case): We always know some context A,
> > and
> > >> sometimes B, and also sometimes we know all of A, B and some addiional
> > >> context C.
> > >>
> > >> so there's a case to be made to generalize the inference architecture:
> > >> specify hierarchy and then learn A/B/C, SGD+loglinear, or whatever
> else.
> > >
> > > I think that these analogies are very strained.
> > >
> > >
> >
>
