mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <>
Subject Low-rank transition matrix approximation
Date Mon, 12 Jul 2010 23:38:56 GMT
Hi all,

I have a question that's a bit more rooted in theory. In my clustering
algorithm, I depend on the markovian properties of the graph representation
of the data, using the eigenvectors/eigenvalues to determine where the
cluster boundaries are. The markov transition matrix is dense, as it
represents the full distribution of a random walk through the entire data

Within the thesis for this algorithm (link: ), the author dedicates
an entire chapter (chpt 8) to a discussion on using successively lower-rank
sparse approximations of the full markov transition matrix in order to speed
up computation of the eigendecomposition (since there is a bottleneck in
that the decomposition is performed at every iterative step in the
clustering process).

My question is whether or not, on a framework like Mahout, this optimization
is necessary. I certainly don't doubt there would be a performance
improvement, but would it be noticeable across a distributed framework? If
the markov transition matrix is already a DistributedRowMatrix, and I'm
already using tools like the DistributedLanczosSolver, is an extra
map/reduce procedure for creating a lower-rank markov matrix really

I'm very eager to hear other folks' thoughts on this. Thanks!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message