mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tr...@cs.drexel.edu
Subject Re: LanczosSVD and Eigenvalues
Date Thu, 23 Jun 2011 17:35:03 GMT
Alright I can reorder that is easy, just had to verify that the ordering
was correct. So when I increased the rank of the results I get Lanczos
bailing out. Which incidentally causes a NullPointerException:

INFO: 9 passes through the corpus so far...
WARNING: Lanczos parameters out of range: alpha = NaN, beta = NaN. 
Bailing out early!
INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal
auxiliary matrix.
Exception in thread "main" java.lang.NullPointerException
	at org.apache.mahout.math.DenseVector.assign(DenseVector.java:133)
	at
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:160)
	at pca.PCASolver.solve(PCASolver.java:53)
	at pca.PCA.main(PCA.java:20)

So I should probably note that my data only has 2 columns, the real data
will have quite a bit more.

The failing happens with 10 and more for rank, with the last, and
therefore most significant eigenvector being <NaN,NaN>.

-Trevor
> The 0 eigenvalue output is not valid, and yes, the output will list the
> results
> in *increasing* order, even though it is finding the largest
> eigenvalues/vectors
> first.
>
> Remember that convergence is gradual, so if you only ask for 3
> eigevectors/values, you won't be very accurate.  If you ask for 10 or
> more,
> the
> largest few will now be quite good.  If you ask for 50, now the top 10-20
> will
> be *extremely* accurate, and maybe the top 30 will still be quite good.
>
> Try out a non-distributed form of what is in the EigenverificationJob to
> re-order the output and collect how accurate your results are (it computes
> errors for you as well).
>
>   -jake
>
> 2011/6/23 <tra26@cs.drexel.edu>
>
>> So, I know that MAHOUT-369 fixed a bug with the distributed version of
>> the
>> LanczosSolver but I am experiencing a similar problem with the
>> non-distributed version.
>>
>> I send a dataset of gaussian distributed numbers (testing PCA stuff) and
>> my eigenvalues are seemingly reversed. Below I have the output given in
>> the logs from LanczosSolver.
>>
>> Output:
>> INFO: Eigenvector 0 found with eigenvalue 0.0
>> INFO: Eigenvector 1 found with eigenvalue 347.8703086831804
>> INFO: LanczosSolver finished.
>>
>> So it returns a vector with eigenvalue 0 before one with an eigenvalue
>> of
>> 347?. Whats more interesting is that when I increase the rank, I get a
>> new
>> eigenvector with a value between 0 and 347:
>>
>> INFO: Eigenvector 0 found with eigenvalue 0.0
>> INFO: Eigenvector 1 found with eigenvalue 44.794928654801566
>> INFO: Eigenvector 2 found with eigenvalue 347.8286920203704
>>
>> Shouldn't the eigenvalues be in descending order? Also is the 0.0
>> eigenvalue even valid?
>>
>> Thanks,
>> Trevor
>>
>>
>



Mime
View raw message