mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Wienert <ste...@wienert.cc>
Subject Re: Need a little help with SVD / Dimensional Reduction
Date Mon, 06 Jun 2011 16:58:19 GMT
Hi.

Thanks for the help.

The important points from wikipedia are:
- The left singular vectors of M are eigenvectors of M*M' .
- The right singular vectors of M are eigenvectors of M'*M.

as you describe, the mahout lanczos solver calculate A=M'*M (I think
it does A=M*M', but it is not a problem). Therefore it does already
calculate the right (or left) singular vector of M.

But my question is, how can I get the other singular vector? I can
transpose M, but then I have to calculated two SVDs, one for the right
and one for the left singular value... I think there is a better way
:)

Hope you can help me with this...
Thanks
Stefan


2011/6/6 Danny Bickson <danny.bickson@gmail.com>:
> Hi
> Mahout SVD implementation computes the Lanzcos iteration:
> http://en.wikipedia.org/wiki/Lanczos_algorithm
> Denote the non-square input matrix as M. First a symmetric matrix A is
> computed by A=M'*M
> Then an approximating tridiagonal matrix T and a vector matrix V are
> computed such that A =~ V*T*V'
> (this process is done in a distributed way).
>
> Next the matrix T is next decomposed into eigenvectors and eignevalues.
> Which is the returned result. (This process
> is serial).
>
> The third step makes the returned eigenvectors orthogonal to each other
> (which is optional IMHO).
>
> The heart of the code is found at:
> ./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java
> At least that is where it was in version 0.4 I am not sure if there are
> changes in version 0.5
>
> Anyway, Mahout does not compute directly SVD. If you are interested in
> learning more about the relation to SVD
> look at: http://en.wikipedia.org/wiki/Singular_value_decomposition,
> subsection: relation to eigenvalue decomposition.
>
> Hope this helps,
>
> Danny Bickson
>
> On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert <stefan@wienert.cc> wrote:
>
>> After reading this thread:
>>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTinQ5K4XrM7naBWn8qoBXZGVobBot2RtjZSV4yOd@mail.gmail.com%3E
>>
>> Wiki-SVD: M = U S V* (* = transposed)
>>
>> The output of Mahout-SVD is (U S) right?
>>
>> So... How do I get V from (U S)  and M?
>>
>> Is V = M (U S)* (because this is, what the calculation in the example is)?
>>
>> Thanks
>> Stefan
>>
>> 2011/6/6 Stefan Wienert <stefan@wienert.cc>:
>> > https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>> >
>> > What is done:
>> >
>> > Input:
>> > tf-idf-matrix (docs x terms) 6076937 x 20444
>> >
>> > "SVD" of tf-idf-matrix (rank 100) produces the eigenvector (and
>> > eigenvalues) of tf-idf-matrix, called:
>> > svd (concepts x terms) 87 x 20444
>> >
>> > transpose tf-idf-matrix:
>> > tf-idf-matrix-transpose (terms x docs) 20444 x 6076937
>> >
>> > transpose svd:
>> > svd-transpose (terms x concepts) 20444 x 87
>> >
>> > matrix multiply:
>> > tf-idf-matrix-transpose x svd-transpose = result
>> > (terms x docs) x (terms x concepts) = (docs x concepts)
>> >
>> > so... I do understand, that the "svd" here is not SVD from wikipedia.
>> > It only does the Lanczos algorithm and some magic which produces the
>> >> Instead either the left or right (but usually the right) eigenvectors
>> premultiplied by the diagonal or the square root of the
>> >> diagonal element.
>> > from
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTi=Rta7tfRm8Zi60VcFya5xF+dbFrJ8pcds2N0-V@mail.gmail.com%3E
>> >
>> > so my question: what is the output of the SVD in mahout. And what do I
>> > have to calculate to get the "right singular value" from svd?
>> >
>> > Thanks,
>> > Stefan
>> >
>> > 2011/6/6 Stefan Wienert <stefan@wienert.cc>:
>> >>
>> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>> >>
>> >> the last step is the matrix multiplication:
>> >>  --arg --numRowsA --arg 20444 \
>> >>  --arg --numColsA --arg 6076937 \
>> >>  --arg --numRowsB --arg 20444 \
>> >>  --arg --numColsB --arg 87 \
>> >> so the result is a 6,076,937 x 87 matrix
>> >>
>> >> the input has 6,076,937 (each with 20,444 terms). so the result of
>> >> matrix multiplication has to be the right singular value regarding to
>> >> the dimensions.
>> >>
>> >> so the result is the "concept-document vector matrix" (as I think,
>> >> these is also called "document vectors" ?)
>> >>
>> >> 2011/6/6 Ted Dunning <ted.dunning@gmail.com>:
>> >>> Yes.  These are term vectors, not document vectors.
>> >>>
>> >>> There is an additional step that can be run to produce document
>> vectors.
>> >>>
>> >>> On Sun, Jun 5, 2011 at 1:16 PM, Stefan Wienert <stefan@wienert.cc>
>> wrote:
>> >>>
>> >>>> compared to SVD, is the result is the "right singular value"?
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Stefan Wienert
>> >>
>> >> http://www.wienert.cc
>> >> stefan@wienert.cc
>> >>
>> >> Telefon: +495251-2026838
>> >> Mobil: +49176-40170270
>> >>
>> >
>> >
>> >
>> > --
>> > Stefan Wienert
>> >
>> > http://www.wienert.cc
>> > stefan@wienert.cc
>> >
>> > Telefon: +495251-2026838
>> > Mobil: +49176-40170270
>> >
>>
>>
>>
>> --
>> Stefan Wienert
>>
>> http://www.wienert.cc
>> stefan@wienert.cc
>>
>> Telefon: +495251-2026838
>> Mobil: +49176-40170270
>>
>



-- 
Stefan Wienert

http://www.wienert.cc
stefan@wienert.cc

Telefon: +495251-2026838
Mobil: +49176-40170270

Mime
View raw message