mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Buss <>
Subject Symmetric eigendecomposition for kernel PCA
Date Wed, 24 Feb 2010 21:53:35 GMT
I was chatting with Jake Mannix on twitter regarding mahout 180 and if
that patch is suitable for sparse symmetric positive-definite
matrices, and he suggested we continue the conversation on the mailing
list, so:

My research partner and I have a dataset that consists of 400,000
users and 1.6 million articles, with about 22 million nonzeros. We are
trying to use this data to make recommendations to users. We have
tried using the SVD and PLSI, both with unsatisfactory results, and
are now attempting kPCA.

We have a 400,000 by 400,000 sparse symmetric positive-definite
matrix, H, that we need the top couple hundred eigenvectors/values
for. Jake has told me that I can use mahout 180 unchanged, but it will
be doing redundant work and the output eigenvalues are the squares of
the ones we actually want. This sounds like a good approach, but it
would be great if mahout had an optimized eigendecomposition for
symmetric matrices. Jake suggested I submit a JIRA ticket regarding
this, which I plan to do.

H is the pairwise distance in feature space (calculated using a kernel
function) between each pair of users (or some subset of users). After
I mentioned this to Jake, he asked me "why aren't you just doing it
all in one go? Kernelize on the rows, and do SVD on that? Why do the
M*M^t intermediate step?" Unfortunately, I'm not sure what you're
asking, Jake, can you clarify?

Steven Buss

View raw message