Thanks folks for taking a look.
I haven't sat down to try it yet, but wondering how hard it is to construct
(realizable and realistic) k11, k12, k21, k22 values for three binary
sequences X, Y, Z where (X,Y) and (Y,Z) have same cooccurrence, but you
can tweak k12 and k21 so that the LLR values are extremely different in
both directions. I assume that k22 doesn't matter much in practice since
things are sparse and k22 is huge. Well, obviously, I guess you could
simply switch the k12/k21 values between the two sequence pairs to flip the
order at will... which is information that cooccurrence of course does not
"know about".
On Sat, Aug 17, 2013 at 10:30 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> This is nice. As you say, k11 is the only part that is used in
> cooccurrence and it doesn't weight by prevalence, either.
>
> This size analysis is hard to demonstrate much difference because it is
> hard to show interesting values of LLR without absurdly string coordination
> between items.
>
>
> On Fri, Aug 16, 2013 at 8:21 PM, B Lyon <bradflyon@gmail.com> wrote:
>
> > As part of trying to get a better grip on recommenders, I have started a
> > simple interactive visualization that begins with the raw data of
> useritem
> > interactions and goes all the way to being able to twiddle the
> interactions
> > in a test user vector to see the impact on recommended items. This is
> for
> > simple "user interacted with an item" case rather than numerical
> > preferences for items. The goal is to show the intermediate pieces and
> how
> > they fit together via popup text on mouseovers and dynamic highlighting
> of
> > the related pieces. I am of course interested in feedback as I keep
> > tweaking on it  not sure I got all the terminology quite right yet, for
> > example, and might have missed some other things I need to know about.
> > Note that this material is covered in Chapter 6.2 in MIA in the
> discussion
> > on distributed recommenders.
> >
> > It's on googledrive here (very much a workinprogress):
> >
> > https://googledrive.com/host/0B2GQktuwcTiWHRwZFJacjlqODA/
> >
> > (apologies to small resolution screens)
> >
> > This is based only on the cooccurrence matrix, rather than including the
> > other similarity measures, although in working through this, it seems
> that
> > the other ones can just be interpreted as having alternative definitions
> of
> > what "*" means in matrix multiplication of A^T*A, where A is the
> useritem
> > matrix... and as an aside to me begs the interesting question of [purely
> > hypotheticall?] situations where LLR and cooccurrence are at odds with
> > each other in making recommendations, as cooccurrence seems to be just
> > using the "k11" term that is part of the LLR calculation.
> >
> > My goal (at the moment at least) is to eventually continue this for the
> > solrrecommender project that started as few weeks ago, where we have the
> > additional crossmatrix, as well as a kind of regrouping of pieces for
> > solr.
> >
> >
> > 
> > BF Lyon
> > http://www.nowherenearithaca.com
> >
>

BF Lyon
http://www.nowherenearithaca.com
