Hi Stefanie!
Not many people have used the HebbianSolver in Mahout (which is
too bad  it was better tested back when it was in the decomposer project,
while the LanczosSolver is newer and less knocked about), so I'm
certainly interested in hearing about how it's working for you.
[response inline]
On Thu, Jan 13, 2011 at 3:34 AM, Stefanie Nagel <nagel@epoq.de> wrote:
>
>
> In the paper "Generalized Hebbian Algorithm for Latent Semantic Analysis"
> (at
> http://www.dcs.shef.ac.uk/~genevieve/gorrell_webb.pdf) the algorithm is
> described. In the documentation of the HebbianSolver it is described as an
> iterative, sparse, singular value decomposition solver.
>
Yep, I would add to the statement that it's not just an iterative, sparse
SVD solver,
but it's a *streamoriented* solver  for while it's not distributed, it
does not require
holding the entire input data set in memory at once.
> As far as I understood the HebbianSolver is designed for
> eigendecomposition,
> but like any such algorithm, getting singular vectors out of it is
> immediate
> (since SVD and eigenvalue decomposition are related to each other).
>
Yep. In fact, Gorrell's work shows how you can modify the algorithm
(becoming
an "Asymmetric Generalized Hebbian Algorithm") to find both the left and
right singular vectors simultaneously, but that implementation was not
chosen
for the current codebase. What happens in the current GHA is that the right
singular vectors are discovered directly (these are the vectors which live
in the
same space as documents: they are wordbags).
> Applied process:
>  solverHebbian.solve(matrix, desired_rank) > result:
> state.getCurrentEigens() (rows are eigenvectors!) and
> state.getCurrentEigenValues()
>  get singular values: singularValue = Math.sqrt(eigenValue) > generate
> sigma
> and sigmaInverse
>  U = state.getCurrentEigens().transpose()
>  V = (a.transpose().times(U)).times(sigmaInverse)
>  approximated_a = (U.times(sigma)).times(V.transpose())
>  calculate frobenius norm of a and approximated_a (in order to see how
> good
> the approximation is)
>
Sounds exactly right.
> I tested the HebbianSolver with several examples.
> And my question is: which matrix is supposed to be the input matrix for the
> HebbianSolver: A or A*A^t? Since the result are eigenvectors and
> eigenvalues I
> would expect A*A^t.
>
You input A. The resultant eigenvectors are actually the eigenvectors of
A^t*A,
not A*A^t, but it's computed without ever computing A^t*A directly, just
using the
rows of A.
> I compared the results for the examples (derived with the HebbianSolver)
> with
> the results derived with R (package corpcor).
>
> 1) If A has to be the input matrix ...
> ... example 1: the process above gets a result which is very similar to the
> result derived by R (U, Sigma and V).
>
this is the right way to compare to HebbianSolver.
> Annotation:
> * The method getRandomStartingIndex(Matrix corpus, Matrix eigens) produces
> an
> infinite loop when every row of the considered matrix has less than 5
> values
> (see at 1) example 3).
>
This is because the code is assuming you're putting in matrices a bit bigger
than
this, yes. It's a hacky way to make sure the algorithm doesn't start out
too slow 
if the row you start with is very small, it's projection onto other rows
will also be
small, and the algorithm will take far longer to get started. It's actually
hacky
enough to be considered a bug, but one which doesn't show up in real data
sets (every row has less than 5 nonzero elements?). To get around this,
you can just comment out " v.getNumNondefaultElements() < 5" on line 253
of HebbianSolver.java
> * I could post my Java Code as well if that would help.
The easiest way to see how the code is used is in the unit tests:
src/test/java/org/apache/mahout/math/decomposer/hebbian/TestHebbianSolver.java
Let me know if you have any more questions.
jake
