mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: [Taste] Sanity Check and Questions
Date Mon, 22 Jun 2009 01:23:15 GMT
Gram-Schmidt doesn't have to change vectors.  You can view it as a way of
selecting from an infinite number of vectors in order to get an orthornormal
basis.  The task of getting an interestingly diverse set of recommendations
is a bit different in that we only have a finite number of items to
recommend and in that orthonormality isn't really a concept.  Another way to
look at it as greedy set-cover.

If you view the idealized recommendation as a vector r, then you can select
the most colinear item x_0 as the best recommendation.  There will be some
aspects of r, however, that x_0 does not satisfy.  Thus, you can take x_1 to
be the item most colinear with (r - x_0).  If r and x_i are binary vectors,
then subtraction must be almost more like set subtraction, but if it is a
reduced dimensional representation like that from LSA, normal subtraction
may work with some renormalization.  x_2 can then be the item most colinear
with (r - x_0 - x_1).  You may want to take two or more items at each step
before looking for diversity.

Another fairly direct method is to just use a threshold where you iterate
down the list of items that are similar to r and take elements that are at
least a certain minimum dissimilarity relative to all previously selected

Analogies from linear algebra can be very misleading for this sort of work
(this is a borderline case), or very helpful (like with LSA).  Usually what
you need to do is take the analogy with a big grain of salt and then
re-imagine the problem.  A good example is the interpretation of LSA, LDA,
MDCA and many other techniques as matrix decompositions under different
probabilistic assumptions.

On Sun, Jun 21, 2009 at 6:01 PM, Sean Owen <> wrote:

> I am particularly intrigued at the moment by this last question, of
> how to pick a sample of very different items. Is the idea here that
> you look at items as vectors of preferences, and try to find the
> most-orthogonal subset of them? Gram-Schmidt would be changing the
> vectors rather than selecting them, so I am curious how these two
> things connect. It is a really good problem I think.

Ted Dunning, CTO

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
858-414-0013 (m)
408-773-0220 (fax)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message