mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Molek <mpmo...@gmail.com>
Subject How to use ssvd for dimensionality reduction of tfidf-vectors?
Date Fri, 19 Oct 2012 16:06:06 GMT
Sorry for the basic question. I've been reading about this for a few hours,
but I'm still confused. I want to use ssvd to reduce the dimensionality of
some tfidf-vectors so I can perform clustering on the result.

Among many other things, I've read:
https://cwiki.apache.org/MAHOUT/dimensional-reduction.html

Which states the process for svd is:

bin/mahout svd (original -> svdOut)
bin/mahout cleansvd ...
bin/mahout transpose svdOut -> svdT
bin/mahout transpose original -> originalT
bin/mahout matrixmult originalT svdT -> newMatrix
bin/mahout kmeans newMatrix

I know you don't need to do cleansvd with ssvd output. My main question is
which of the three outputs of ssvd should I be transposing and multiplying
with the original tfidf-matrix? I'm having trouble understanding the math
that's going on.

ssvd outputs U, V, and sigma, and despite reading a bunch, I'm still
confused on which of these outputs I should be using, and how. Could anyone
spell it out for me?

Thanks for any help,
Matt

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message