Alright I can reorder that is easy, just had to verify that the ordering
was correct. So when I increased the rank of the results I get Lanczos
bailing out. Which incidentally causes a NullPointerException:
INFO: 9 passes through the corpus so far...
WARNING: Lanczos parameters out of range: alpha = NaN, beta = NaN.
Bailing out early!
INFO: Lanczos iteration complete  now to diagonalize the tridiagonal
auxiliary matrix.
Exception in thread "main" java.lang.NullPointerException
at org.apache.mahout.math.DenseVector.assign(DenseVector.java:133)
at
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:160)
at pca.PCASolver.solve(PCASolver.java:53)
at pca.PCA.main(PCA.java:20)
So I should probably note that my data only has 2 columns, the real data
will have quite a bit more.
The failing happens with 10 and more for rank, with the last, and
therefore most significant eigenvector being <NaN,NaN>.
Trevor
> The 0 eigenvalue output is not valid, and yes, the output will list the
> results
> in *increasing* order, even though it is finding the largest
> eigenvalues/vectors
> first.
>
> Remember that convergence is gradual, so if you only ask for 3
> eigevectors/values, you won't be very accurate. If you ask for 10 or
> more,
> the
> largest few will now be quite good. If you ask for 50, now the top 1020
> will
> be *extremely* accurate, and maybe the top 30 will still be quite good.
>
> Try out a nondistributed form of what is in the EigenverificationJob to
> reorder the output and collect how accurate your results are (it computes
> errors for you as well).
>
> jake
>
> 2011/6/23 <tra26@cs.drexel.edu>
>
>> So, I know that MAHOUT369 fixed a bug with the distributed version of
>> the
>> LanczosSolver but I am experiencing a similar problem with the
>> nondistributed version.
>>
>> I send a dataset of gaussian distributed numbers (testing PCA stuff) and
>> my eigenvalues are seemingly reversed. Below I have the output given in
>> the logs from LanczosSolver.
>>
>> Output:
>> INFO: Eigenvector 0 found with eigenvalue 0.0
>> INFO: Eigenvector 1 found with eigenvalue 347.8703086831804
>> INFO: LanczosSolver finished.
>>
>> So it returns a vector with eigenvalue 0 before one with an eigenvalue
>> of
>> 347?. Whats more interesting is that when I increase the rank, I get a
>> new
>> eigenvector with a value between 0 and 347:
>>
>> INFO: Eigenvector 0 found with eigenvalue 0.0
>> INFO: Eigenvector 1 found with eigenvalue 44.794928654801566
>> INFO: Eigenvector 2 found with eigenvalue 347.8286920203704
>>
>> Shouldn't the eigenvalues be in descending order? Also is the 0.0
>> eigenvalue even valid?
>>
>> Thanks,
>> Trevor
>>
>>
>
