The matrix algebra is just a compact notation for a pattern of arithmetic
operations.
Let's actually take documents as rows and words as columns since that is
more common practice. If you look at the definition of the matrix product
A' A, (where A' is the transpose of A, i.e. it is term by document rather
than document by term), then we get this:
b_ij = sum_k a_ki a_kj
If all of the elements of A are binary, then the product inside the sum will
be 1 where both a_ik and a_jk are 1. That means the sum will be a count of
the documents which have both word i and word j. This is the cooccurrence
count for words i and j. IF A is not binary, but is weighted then this sum
is the weighted similarity between words i and j.
Repeating this trick, you find that B'B = (A'A)' (A'A) is a term by term
matrix that measures how similar the coocurrence vectors are for two words.
If you expand it out, you will see that B'B is nothing but a bunch of
dotproducts (i.e. cosines of angles multiplied by magnitudes). You may
want to normalize the rows of A'A if you are using weighted arithmetic or
sparsify A'A if you are using counts, but the pattern of operations.
Again, there is nothing magical about the matrix notation. It is just a
compact way of describing a bunch of arithmetic. It also let's us tap into
a wealth of results that have been derived for linear algebra that we can
either misuse for our purposes or from which we can derive inspiration.
On Wed, Jun 24, 2009 at 6:02 AM, Paul Jones <paul_jonez99@yahoo.co.uk>wrote:
> >>>> Okay aside from being confused with matrix algebra :), am confused
> with the "easy to implement using a doc x term matrix", i.e not sure how a
> docterm matrix would work out the similiarity between words, is it not
> working out the occurrence of words in a doc. Maybe I am
> misunderstanding...Lets say I have a matrix built, where the docs are the
> columns, and the words as rows, now my limited understanding from what I
> have read says that this matrix can be represented as a number of vectors,
> eg lets say we have one document, with 3 words, then the x/y/z axis will
> represent each word and its freq of occurence, and hence the point in space
> forming the vector depicts this word related to that document.
>
> And this can be expanded. Now if we have 2 documents, with 2 more words, we
> have another point. The distance between them shows how similar they are,
> and hence how similar the document is too each other.
>
> So far so good, but I am unsure how this translates in showing how similar
> the words are themselves, i.e cooccurence, would that not have to be a
> termterm matrix
>

Ted Dunning, CTO
DeepDyve
111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
8584140013 (m)
4087730220 (fax)
