mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Lanczos Algorithm
Date Fri, 19 Nov 2010 16:08:31 GMT
First, the Lanczos decomposition in Mahout is pretty new.  It has passed
some tests, but has a very reasonable probability of significant bugs,
especially if you are giving it a matrix that does not have a strongly
decreasing sequence of singular values.

Remember also that the singular values are not necessarily returned in the
order you might expect.

Can you say a bit more about your data?  How large?  How sparse?

This may also be of interest to you:
https://issues.apache.org/jira/browse/MAHOUT-376

I see this stochastic decomposition as the way forward to larger
decomposition than can be done with the Lanczos solver.

2010/11/19 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>

> Hi Pedro,
>
> Which version of Mahout are you working with? I think
> DistributedLanczosSolver is available from from version 0.4. I have more or
> less the same doubts about the results as you have, so i'm willing to read
> the answers to these questions :)
>
>
> Best.
> Fernando.
>
> 2010/11/19 PEDRO MANUEL JIMENEZ RODRIGUEZ <pmjimenez1983@hotmail.com>
>
> >
> > Dear Mahout developers,
> >
> > I'm a Computer Science student from the National University of Distance
> > Education in Spain. I'm currently developing my final year project which
> is
> > about Diffusion Maps.
> >
> > This method is used for dimensionality reduction and it uses the Lanczos
> > algorithm during its operations. The method is already implemented in the
> > last release version
> > of Mahout in the LanczosSolver class but we foresee the need to use the
> > algorithm with distributed calculations. This implementation of Diffusion
> > Maps has to deal
> > with extremely large matrices and the distributed calculation is critical
> > for me.
> >
> > I have noticed that there is a DistributedLanzcosSolver class implemented
> > in the Mahout library but I can’t have access to the source code because
> it
> > isn't in the
> > last release version of Mahout.
> >
> >
> > Could you please let me know if I could have access to the source code of
> > this class?Also I would like to ask you about how the LanczosSolver
> > implementation works. I have made some test between this class and other
> > program which has been implemented in R. This program is using a library
> > called Arpack, which also uses the Lanczos algorithm. When I calculate
> the
> > eigenvalues and the eigenvectors of a symmetric matrix. I haven’t the
> same
> > results. For example:
> > For this matrix:
> >
> >
> > 4.42282138 1.51744077 0.07690571 0.93650042 2.19609401
> > 1.51744077 1.73849477 -0.11856149 0.76555191 1.3673608
> > 0.07690571 -0.11856149 0.55065932 1.72163263 -0.2283693
> > 0.93650042 0.76555191 1.72163263 0.09470345 -1.16626194
> > 2.19609401 1.3673608 -0.2283693 -1.16626194 -0.37321311
> > Results for R:
> >
> >
> > Eigenvalues
> >
> >  -0.6442398  1.1084103  2.3946915  6.2018925
> >
> > Eigenvectors            [,1]        [,2]         [,3]       [,4]
> >
> > [1,] -0.17050824  0.46631043 -0.010360993 0.83660453
> > [2,] -0.06455473 -0.87762807 -0.008814402 0.40939079
> > [3,]  0.68602882  0.04706265 -0.666429293 0.02602181
> > [4,] -0.39567054 -0.07491643 -0.670834157 0.12161492
> > [5,]  0.58272541 -0.06705358  0.325066897 0.34208875
> >
> >
> > Results for Java:
> >
> >
> > Eigenvalues
> >
> > 0.0 0.007869004183962289 0.023293016691817894 0.10872358093523908
> > 0.13087002850143611
> > I never get the same eigenvalues, I think this is because the
> documentation
> > of the class says:
> > To avoid floating point overflow problems which arise in power-methods
> like
> > Lanczos, an initial pass is made through the input matrix to generate a
> good
> > starting seed vector by summing all the rows of the input matrix, and
> > compute the trace(inputMatrixt*matrix)
> > This latter value, being the sum of all of the singular values, is used
> to
> > rescale the entire matrix, effectively forcing the largest singular value
> to
> > be strictly
> > less than one, and transforming floating point overflow problems into
> > floating point underflow (ie, very small singular values will become
> > invisible, as they will
> > appear to be zero and the algorithm will terminate).
> > Is it possible to return the eigenvalues to theirs original value?
> > Eigenvectors
> >
> > -0.83660453 0.23122937 0.010360993   0.46631043 -0.17050824
> > -0.40939079 0.24067227 0.008814402  -0.87762807 -0.06455473
> > -0.02602181 0.28695718 0.666429293   0.04706265 0.68602882
> > -0.12161492 -0.61075665 0.670834157  -0.07491643 -0.39567054
> > -0.34208875 -0.65821099 -0.325066897  -0.06705358 0.58272541
> >
> >
> >
> > Always happens the same. I have to force the calculation of N vectors
> (with
> > an N x N matrix) to obtain the same values for the eigenvectors,
> > except in the sign of some of the values, which is acceptable. I thought
> > this implementation of the algorithm should return the eigenvectors
> sorted
> > but all the time I’m obtaining a vector which I don’t want to calculate
> > between them.
> > In the example above it’s the second one starting from the left.Why is
> this
> > happen?
> >
> > Thanks in advance.
> >
> > K.r.Pedro
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message