Very close. You are conceptually exactly correct.
If A contains binary visit or view data, then A'A contains counts that must
be reduced to binary values or weights using some statistical procedure. I
prefer LLR and binary results.
If A contains counts weighted by inverse user frequency, then your dot
product is roughly usable as a similarity score. This is especially true if
rows of A are normalized somehow to account for overactive users.
On Wed, Sep 9, 2009 at 3:42 PM, Gökhan Çapan <gkhncpn@gmail.com> wrote:
> A is the user x item history matrix. Each row is a user history.
> >
> > A' is the transposed user x item matrix which is of the shape item x
> user.
> >
> > A' A is the userlevel item cooccurrence matrix and has the shape item x
> > item.
> >
>
> Then (A' A)ij is a similarity weight between ith and jth items.
> if Aij is the "rating of ith user for jth item", the highest value of
> "ith row of A' A" is the most similar item for "ith item".
> if the values in A are binary, then (A' A)ij is number of users who have
> rated/clicked/viewed both item i and item j.
>

Ted Dunning, CTO
DeepDyve
