Sorry for the basic question. I've been reading about this for a few hours,
but I'm still confused. I want to use ssvd to reduce the dimensionality of
some tfidfvectors so I can perform clustering on the result.
Among many other things, I've read:
https://cwiki.apache.org/MAHOUT/dimensionalreduction.html
Which states the process for svd is:
bin/mahout svd (original > svdOut)
bin/mahout cleansvd ...
bin/mahout transpose svdOut > svdT
bin/mahout transpose original > originalT
bin/mahout matrixmult originalT svdT > newMatrix
bin/mahout kmeans newMatrix
I know you don't need to do cleansvd with ssvd output. My main question is
which of the three outputs of ssvd should I be transposing and multiplying
with the original tfidfmatrix? I'm having trouble understanding the math
that's going on.
ssvd outputs U, V, and sigma, and despite reading a bunch, I'm still
confused on which of these outputs I should be using, and how. Could anyone
spell it out for me?
Thanks for any help,
Matt
