Asking because i am considering pulling this implementation but for some
(mostly political) reasons people want to try different things here.
I may also have to start with a different way of constructing
cooccurrences, and may do a few optimizations there (i.e. priority queue
queing/enqueing does twice the work it really needs to do etc.)
On Wed, Aug 6, 2014 at 5:05 PM, Sebastian Schelter <ssc.open@googlemail.com>
wrote:
> I chose against porting all the similarity measures to the dsl version of
> the cooccurrence analysis for two reasons. First, adding the measures in a
> generalizable way makes the code superhard to read. Second, in practice, I
> have never seen something giving better results than llr. As Ted pointed
> out, a lot of the foundations of using similarity measures comes from
> wanting to predict ratings, which people never do in practice. I think we
> should restrict ourselves to approaches that work with implicit, countlike
> data.
>
> s
> Am 06.08.2014 16:58 schrieb "Ted Dunning" <ted.dunning@gmail.com>:
>
> > On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > wrote:
> >
> > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> > > wrote:
> > >
> > > I suppose in that context LLR is considered a distance (higher scores
> > mean
> > > > more `distant` items, cooccurring by chance only)?
> > > >
> > >
> > > Selfcorrection on this one  having given a quick look at llr paper
> > > again, it looks like it is actually a similarity (higher scores meaning
> > > more stable cooccurrences, i.e. it moves in the opposite direction of
> > > pvalue if it had been a classic test
> > >
> >
> > LLR is a classic test. It is essentially Pearson's chi^2 test without
> the
> > normal approximation. See my papers[1][2] introducing the test into
> > computational linguistics (which ultimately brought it into all kinds of
> > fields including recommendations) and also references for the G^2
> test[3].
> >
> > [1] http://www.aclweb.org/anthology/J931003
> > [2] http://arxiv.org/abs/1207.1847
> > [3] http://en.wikipedia.org/wiki/Gtest
> >
>
