GramSchmidt doesn't have to change vectors. You can view it as a way of
selecting from an infinite number of vectors in order to get an orthornormal
basis. The task of getting an interestingly diverse set of recommendations
is a bit different in that we only have a finite number of items to
recommend and in that orthonormality isn't really a concept. Another way to
look at it as greedy setcover.
If you view the idealized recommendation as a vector r, then you can select
the most colinear item x_0 as the best recommendation. There will be some
aspects of r, however, that x_0 does not satisfy. Thus, you can take x_1 to
be the item most colinear with (r  x_0). If r and x_i are binary vectors,
then subtraction must be almost more like set subtraction, but if it is a
reduced dimensional representation like that from LSA, normal subtraction
may work with some renormalization. x_2 can then be the item most colinear
with (r  x_0  x_1). You may want to take two or more items at each step
before looking for diversity.
Another fairly direct method is to just use a threshold where you iterate
down the list of items that are similar to r and take elements that are at
least a certain minimum dissimilarity relative to all previously selected
items.
Analogies from linear algebra can be very misleading for this sort of work
(this is a borderline case), or very helpful (like with LSA). Usually what
you need to do is take the analogy with a big grain of salt and then
reimagine the problem. A good example is the interpretation of LSA, LDA,
MDCA and many other techniques as matrix decompositions under different
probabilistic assumptions.
On Sun, Jun 21, 2009 at 6:01 PM, Sean Owen <srowen@gmail.com> wrote:
> I am particularly intrigued at the moment by this last question, of
> how to pick a sample of very different items. Is the idea here that
> you look at items as vectors of preferences, and try to find the
> mostorthogonal subset of them? GramSchmidt would be changing the
> vectors rather than selecting them, so I am curious how these two
> things connect. It is a really good problem I think.
>

Ted Dunning, CTO
DeepDyve
111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
8584140013 (m)
4087730220 (fax)
