mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: LanczosSVD and Eigenvalues
Date Thu, 23 Jun 2011 17:45:44 GMT
Ah, if your matrix only has 2 columns, you can't go to rank 10.  Try on
some slightly less synthetic data of more than rank 10.  You can't
ask Lanczos for more reduced rank than that of the matrix itself.

  -jake

2011/6/23 <tra26@cs.drexel.edu>

> Alright I can reorder that is easy, just had to verify that the ordering
> was correct. So when I increased the rank of the results I get Lanczos
> bailing out. Which incidentally causes a NullPointerException:
>
> INFO: 9 passes through the corpus so far...
> WARNING: Lanczos parameters out of range: alpha = NaN, beta = NaN.
> Bailing out early!
> INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal
> auxiliary matrix.
> Exception in thread "main" java.lang.NullPointerException
>        at org.apache.mahout.math.DenseVector.assign(DenseVector.java:133)
>        at
>
> org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:160)
>        at pca.PCASolver.solve(PCASolver.java:53)
>        at pca.PCA.main(PCA.java:20)
>
> So I should probably note that my data only has 2 columns, the real data
> will have quite a bit more.
>
> The failing happens with 10 and more for rank, with the last, and
> therefore most significant eigenvector being <NaN,NaN>.
>
> -Trevor
> > The 0 eigenvalue output is not valid, and yes, the output will list the
> > results
> > in *increasing* order, even though it is finding the largest
> > eigenvalues/vectors
> > first.
> >
> > Remember that convergence is gradual, so if you only ask for 3
> > eigevectors/values, you won't be very accurate.  If you ask for 10 or
> > more,
> > the
> > largest few will now be quite good.  If you ask for 50, now the top 10-20
> > will
> > be *extremely* accurate, and maybe the top 30 will still be quite good.
> >
> > Try out a non-distributed form of what is in the EigenverificationJob to
> > re-order the output and collect how accurate your results are (it
> computes
> > errors for you as well).
> >
> >   -jake
> >
> > 2011/6/23 <tra26@cs.drexel.edu>
> >
> >> So, I know that MAHOUT-369 fixed a bug with the distributed version of
> >> the
> >> LanczosSolver but I am experiencing a similar problem with the
> >> non-distributed version.
> >>
> >> I send a dataset of gaussian distributed numbers (testing PCA stuff) and
> >> my eigenvalues are seemingly reversed. Below I have the output given in
> >> the logs from LanczosSolver.
> >>
> >> Output:
> >> INFO: Eigenvector 0 found with eigenvalue 0.0
> >> INFO: Eigenvector 1 found with eigenvalue 347.8703086831804
> >> INFO: LanczosSolver finished.
> >>
> >> So it returns a vector with eigenvalue 0 before one with an eigenvalue
> >> of
> >> 347?. Whats more interesting is that when I increase the rank, I get a
> >> new
> >> eigenvector with a value between 0 and 347:
> >>
> >> INFO: Eigenvector 0 found with eigenvalue 0.0
> >> INFO: Eigenvector 1 found with eigenvalue 44.794928654801566
> >> INFO: Eigenvector 2 found with eigenvalue 347.8286920203704
> >>
> >> Shouldn't the eigenvalues be in descending order? Also is the 0.0
> >> eigenvalue even valid?
> >>
> >> Thanks,
> >> Trevor
> >>
> >>
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message