2010/11/23 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>
> Hi Jake,
>
> You say that LanczosSolver leaves the eigenvalues in the reverse order that
> I expect, but, at least the eigenvectors are given in the same order,
> aren't
> they? I mean, If I want the top N eigenvalues and eigenvectors, and i ask
> for 4N, I should take the last N eigenvalues and the N eigenvectors, right?
>
The eigenvectors should be paired up with their values, yes. If you look at
the class org.apache.mahout.math.hadoop.decomposer.EigenVerificationJob,
you'll find the way I typically run the algorithm. I notice in looking at
that,
that it's not set up to handle symmetric matrices properly  it always
assumes
the input matrix is asymmetric and you're looking for singular vectors (not
eigenvectors of a symmetric matrix).
> I read also that there is an issue regarding the number of eigenvalues and
> eigenvectors that are returned when you ask LanczosSolver for N of them,
> and
> I have checked that if I ask for 40, for example, I get only 39. Will this
> leave out the biggest eigenvector and eigenvalue or the smallest one, or
> none of these? I spent the last weeks reading abouth this in the
> discussions
> but it's not very clear for me yet.
>
Your last eigenvector will be the largest one, and the first will be the
smallest
one. You are getting one rank less than you asked for, however, and that
seems to be because the code is treating your initial guess vector (that you
feed into the algorithm to start iterative matrix multiplication) as one
of the possible eigenvectors (which is most certainly is not). That might
not
be the issue, but either way, I think if you want the biggest 10
eigenvectors,
if you ask for 30 or 40, and take the last 10, you'll have the biggest 10
(not
the 2nd biggest through 11th biggest).
> Could you give us some help?
>
> By the way, I'm interested in having Uk and also V*S(1) so i'm able to
> project new rows into the U space...
>
That's a simple matter of doing some extra matrix multiplications, right?
The
output of the current implementation of Lanczos is K vectors with
dimensionality
equal to M = numColumns(inputMatrix), and thus naturally form a dense,
rowpartitioned K x M matrix = V^t. If you scale each of these singular
vectors
by 1/(their singular value), you'll have a set of K vectors suitable for
projecting
new rows onto the reduced dimensional space (of course, you also need to
project all of the original rows onto this space too, because you have V^t,
not
U. In fact, if ever matrix is being represented as a rowmatrix, you don't
really want U, because that's a matrix of K singular vectors, each having
numRows(inputMatrix) values  you want numRows vectors, each having K
values: U^t).
Does that help?
jake
