Dear Mahout developers,
I'm a Computer Science student from the National University of Distance Education in Spain. I'm currently developing my final year project which is about Diffusion Maps.
This method is used for dimensionality reduction and it uses the Lanczos algorithm during its operations. The method is already implemented in the last release version
of Mahout in the LanczosSolver class but we foresee the need to use the algorithm with distributed calculations. This implementation of Diffusion Maps has to deal
with extremely large matrices and the distributed calculation is critical for me.
I have noticed that there is a DistributedLanzcosSolver class implemented in the Mahout library but I can’t have access to the source code because it isn't in the
last release version of Mahout.
Could you please let me know if I could have access to the source code of this class?Also I would like to ask you about how the LanczosSolver implementation works. I have made some test between this class and other
program which has been implemented in R. This program is using a library called Arpack, which also uses the Lanczos algorithm. When I calculate the eigenvalues and the eigenvectors of a symmetric matrix. I haven’t the same results. For example:
For this matrix:
4.42282138 1.51744077 0.07690571 0.93650042 2.19609401
1.51744077 1.73849477 -0.11856149 0.76555191 1.3673608
0.07690571 -0.11856149 0.55065932 1.72163263 -0.2283693
0.93650042 0.76555191 1.72163263 0.09470345 -1.16626194
2.19609401 1.3673608 -0.2283693 -1.16626194 -0.37321311
Results for R:
Eigenvalues
-0.6442398 1.1084103 2.3946915 6.2018925
Eigenvectors [,1] [,2] [,3] [,4]
[1,] -0.17050824 0.46631043 -0.010360993 0.83660453
[2,] -0.06455473 -0.87762807 -0.008814402 0.40939079
[3,] 0.68602882 0.04706265 -0.666429293 0.02602181
[4,] -0.39567054 -0.07491643 -0.670834157 0.12161492
[5,] 0.58272541 -0.06705358 0.325066897 0.34208875
Results for Java:
Eigenvalues
0.0 0.007869004183962289 0.023293016691817894 0.10872358093523908 0.13087002850143611
I never get the same eigenvalues, I think this is because the documentation of the class says:
To avoid floating point overflow problems which arise in power-methods like Lanczos, an initial pass is made through the input matrix to generate a good
starting seed vector by summing all the rows of the input matrix, and compute the trace(inputMatrixt*matrix)
This latter value, being the sum of all of the singular values, is used to rescale the entire matrix, effectively forcing the largest singular value to be strictly
less than one, and transforming floating point overflow problems into floating point underflow (ie, very small singular values will become invisible, as they will
appear to be zero and the algorithm will terminate).
Is it possible to return the eigenvalues to theirs original value?
Eigenvectors
-0.83660453 0.23122937 0.010360993 0.46631043 -0.17050824
-0.40939079 0.24067227 0.008814402 -0.87762807 -0.06455473
-0.02602181 0.28695718 0.666429293 0.04706265 0.68602882
-0.12161492 -0.61075665 0.670834157 -0.07491643 -0.39567054
-0.34208875 -0.65821099 -0.325066897 -0.06705358 0.58272541
Always happens the same. I have to force the calculation of N vectors (with an N x N matrix) to obtain the same values for the eigenvectors,
except in the sign of some of the values, which is acceptable. I thought this implementation of the algorithm should return the eigenvectors sorted but all the time I’m obtaining a vector which I don’t want to calculate between them.
In the example above it’s the second one starting from the left.Why is this happen?
Thanks in advance.
K.r.Pedro