mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Wienert <ste...@wienert.cc>
Subject Re: Need a little help with SVD / Dimensional Reduction
Date Mon, 06 Jun 2011 19:08:11 GMT
Yeha :) That's what I assumed. So this problem is solved. Thanks!

And now your questions:

I want to do LSA/LSI with Mahout. Therefore I take the TDIDF
Document-Term-Matrix and reduce it using Lanczos. So I get the
Document-Concept-Vectors. These Vectors will be compared with cosine
similarity to find similar documents. I can also cluster these
documents with k-means...

If you have any suggestions, feel free to tell me. I'm also interested
in other document-similarity-techniques.

Cheers,
Stefan

2011/6/6 Danny Bickson <danny.bickson@gmail.com>:
> If I understand your question correctly, you need to simply transpose M
> before you start the run and that way
> you will get the other singular vectors.
>
> May I ask what is the problem you are working on and why do you need the
> singular vectors?
> Can you consider using another matrix decomposition technique for example
> alternating least squares
> which gives you two lower rank matrices which simulates the large decomposed
> matrix?
>
> On Mon, Jun 6, 2011 at 1:30 PM, Stefan Wienert <stefan@wienert.cc> wrote:
>
>> Hi Danny!
>>
>> I understand that for M*M' (and for M'*M) the left and right
>> eigenvectors are identical. But that is not exactly what I want. The
>> lanczos solver from mahout gives me the eigenvectors of M*M', which
>> are the left singular vectors of M. But I need the right singular
>> vectors of M (and not M*M'). How do I get them?
>>
>> Sorry, my matrix math is not as good as it should be, but I hope you
>> can help me!
>>
>> Thanks,
>> Stefan
>>
>> 2011/6/6 Danny Bickson <danny.bickson@gmail.com>:
>> > Hi Stefan!
>> > For a positive semidefinite matrix, the lest and right eigenvectors are
>> > identical.
>> > See SVD wikipeida text: When *M* is also positive
>> > semi-definite<http://en.wikipedia.org/wiki/Positive-definite_matrix>,
>> > the decomposition *M* = *U**D**U* * is also a singular value
>> decomposition.
>> > So you don't need to be worried about the other singular vectors.
>> >
>> > Hope this helps!
>> >
>> > On Mon, Jun 6, 2011 at 12:57 PM, Stefan Wienert <stefan@wienert.cc>
>> wrote:
>> >
>> >> Hi.
>> >>
>> >> Thanks for the help.
>> >>
>> >> The important points from wikipedia are:
>> >> - The left singular vectors of M are eigenvectors of M*M' .
>> >> - The right singular vectors of M are eigenvectors of M'*M.
>> >>
>> >> as you describe, the mahout lanczos solver calculate A=M'*M (I think
>> >> it does A=M*M', but it is not a problem). Therefore it does already
>> >> calculate the right (or left) singular vector of M.
>> >>
>> >> But my question is, how can I get the other singular vector? I can
>> >> transpose M, but then I have to calculated two SVDs, one for the right
>> >> and one for the left singular value... I think there is a better way
>> >> :)
>> >>
>> >> Hope you can help me with this...
>> >> Thanks
>> >> Stefan
>> >>
>> >>
>> >> 2011/6/6 Danny Bickson <danny.bickson@gmail.com>:
>> >> > Hi
>> >> > Mahout SVD implementation computes the Lanzcos iteration:
>> >> > http://en.wikipedia.org/wiki/Lanczos_algorithm
>> >> > Denote the non-square input matrix as M. First a symmetric matrix A
is
>> >> > computed by A=M'*M
>> >> > Then an approximating tridiagonal matrix T and a vector matrix V are
>> >> > computed such that A =~ V*T*V'
>> >> > (this process is done in a distributed way).
>> >> >
>> >> > Next the matrix T is next decomposed into eigenvectors and
>> eignevalues.
>> >> > Which is the returned result. (This process
>> >> > is serial).
>> >> >
>> >> > The third step makes the returned eigenvectors orthogonal to each
>> other
>> >> > (which is optional IMHO).
>> >> >
>> >> > The heart of the code is found at:
>> >> >
>> >>
>> ./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java
>> >> > At least that is where it was in version 0.4 I am not sure if there
>> are
>> >> > changes in version 0.5
>> >> >
>> >> > Anyway, Mahout does not compute directly SVD. If you are interested
in
>> >> > learning more about the relation to SVD
>> >> > look at: http://en.wikipedia.org/wiki/Singular_value_decomposition,
>> >> > subsection: relation to eigenvalue decomposition.
>> >> >
>> >> > Hope this helps,
>> >> >
>> >> > Danny Bickson
>> >> >
>> >> > On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert <stefan@wienert.cc>
>> >> wrote:
>> >> >
>> >> >> After reading this thread:
>> >> >>
>> >> >>
>> >>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTinQ5K4XrM7naBWn8qoBXZGVobBot2RtjZSV4yOd@mail.gmail.com%3E
>> >> >>
>> >> >> Wiki-SVD: M = U S V* (* = transposed)
>> >> >>
>> >> >> The output of Mahout-SVD is (U S) right?
>> >> >>
>> >> >> So... How do I get V from (U S)  and M?
>> >> >>
>> >> >> Is V = M (U S)* (because this is, what the calculation in the example
>> >> is)?
>> >> >>
>> >> >> Thanks
>> >> >> Stefan
>> >> >>
>> >> >> 2011/6/6 Stefan Wienert <stefan@wienert.cc>:
>> >> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>> >> >> >
>> >> >> > What is done:
>> >> >> >
>> >> >> > Input:
>> >> >> > tf-idf-matrix (docs x terms) 6076937 x 20444
>> >> >> >
>> >> >> > "SVD" of tf-idf-matrix (rank 100) produces the eigenvector
(and
>> >> >> > eigenvalues) of tf-idf-matrix, called:
>> >> >> > svd (concepts x terms) 87 x 20444
>> >> >> >
>> >> >> > transpose tf-idf-matrix:
>> >> >> > tf-idf-matrix-transpose (terms x docs) 20444 x 6076937
>> >> >> >
>> >> >> > transpose svd:
>> >> >> > svd-transpose (terms x concepts) 20444 x 87
>> >> >> >
>> >> >> > matrix multiply:
>> >> >> > tf-idf-matrix-transpose x svd-transpose = result
>> >> >> > (terms x docs) x (terms x concepts) = (docs x concepts)
>> >> >> >
>> >> >> > so... I do understand, that the "svd" here is not SVD from
>> wikipedia.
>> >> >> > It only does the Lanczos algorithm and some magic which produces
>> the
>> >> >> >> Instead either the left or right (but usually the right)
>> eigenvectors
>> >> >> premultiplied by the diagonal or the square root of the
>> >> >> >> diagonal element.
>> >> >> > from
>> >> >>
>> >>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTi=Rta7tfRm8Zi60VcFya5xF+dbFrJ8pcds2N0-V@mail.gmail.com%3E
>> >> >> >
>> >> >> > so my question: what is the output of the SVD in mahout. And
what
>> do I
>> >> >> > have to calculate to get the "right singular value" from svd?
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Stefan
>> >> >> >
>> >> >> > 2011/6/6 Stefan Wienert <stefan@wienert.cc>:
>> >> >> >>
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>> >> >> >>
>> >> >> >> the last step is the matrix multiplication:
>> >> >> >>  --arg --numRowsA --arg 20444 \
>> >> >> >>  --arg --numColsA --arg 6076937 \
>> >> >> >>  --arg --numRowsB --arg 20444 \
>> >> >> >>  --arg --numColsB --arg 87 \
>> >> >> >> so the result is a 6,076,937 x 87 matrix
>> >> >> >>
>> >> >> >> the input has 6,076,937 (each with 20,444 terms). so the
result of
>> >> >> >> matrix multiplication has to be the right singular value
regarding
>> to
>> >> >> >> the dimensions.
>> >> >> >>
>> >> >> >> so the result is the "concept-document vector matrix"
(as I think,
>> >> >> >> these is also called "document vectors" ?)
>> >> >> >>
>> >> >> >> 2011/6/6 Ted Dunning <ted.dunning@gmail.com>:
>> >> >> >>> Yes.  These are term vectors, not document vectors.
>> >> >> >>>
>> >> >> >>> There is an additional step that can be run to produce
document
>> >> >> vectors.
>> >> >> >>>
>> >> >> >>> On Sun, Jun 5, 2011 at 1:16 PM, Stefan Wienert <stefan@wienert.cc
>> >
>> >> >> wrote:
>> >> >> >>>
>> >> >> >>>> compared to SVD, is the result is the "right singular
value"?
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Stefan Wienert
>> >> >> >>
>> >> >> >> http://www.wienert.cc
>> >> >> >> stefan@wienert.cc
>> >> >> >>
>> >> >> >> Telefon: +495251-2026838
>> >> >> >> Mobil: +49176-40170270
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Stefan Wienert
>> >> >> >
>> >> >> > http://www.wienert.cc
>> >> >> > stefan@wienert.cc
>> >> >> >
>> >> >> > Telefon: +495251-2026838
>> >> >> > Mobil: +49176-40170270
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Stefan Wienert
>> >> >>
>> >> >> http://www.wienert.cc
>> >> >> stefan@wienert.cc
>> >> >>
>> >> >> Telefon: +495251-2026838
>> >> >> Mobil: +49176-40170270
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Stefan Wienert
>> >>
>> >> http://www.wienert.cc
>> >> stefan@wienert.cc
>> >>
>> >> Telefon: +495251-2026838
>> >> Mobil: +49176-40170270
>> >>
>> >
>>
>>
>>
>> --
>> Stefan Wienert
>>
>> http://www.wienert.cc
>> stefan@wienert.cc
>>
>> Telefon: +495251-2026838
>> Mobil: +49176-40170270
>>
>



-- 
Stefan Wienert

http://www.wienert.cc
stefan@wienert.cc

Telefon: +495251-2026838
Mobil: +49176-40170270

Mime
View raw message