spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: QR decomposition in Spark ALS
Date Thu, 06 Mar 2014 18:52:43 GMT
Yup, this would definitely be fine. I’d like to understand when this happens though, I imagine
it might be if a user / product has no ratings (though we should certainly try to run well
in that case).

Matei

On Mar 6, 2014, at 10:00 AM, Debasish Das <debasish.das83@gmail.com> wrote:

> Matei,
> 
> If the data has linearly dependent rows ALS should have a failback
> mechanism. Either remove the rows and then call BLAS posv or call BLAS gesv
> or Breeze QR decomposition.
> 
> I can share the analysis over email.
> 
> Thanks.
> Deb
> 
> 
> On Thu, Mar 6, 2014 at 9:39 AM, Matei Zaharia <matei.zaharia@gmail.com>wrote:
> 
>> Xt*X should mathematically always be positive semi-definite, so the only
>> way this might be bad is if it's not invertible due to linearly dependent
>> rows. This might happen due to the initialization or possibly due to
>> numerical issues, though it seems unlikely. Maybe it also happens if some
>> users rated no products, or something like that.
>> 
>> Matei
>> 
>> On Mar 6, 2014, at 7:59 AM, Sean Owen <sowen@cloudera.com> wrote:
>> 
>>> Hmm, Will Xt*X be positive definite in all cases? For example it's not
>>> if X has linearly independent rows? (I'm not going to guarantee 100%
>>> that I haven't missed something there.)
>>> 
>>> Even though your data is huge, if it was generated by some synthetic
>>> process, maybe it is very low rank?
>>> 
>>> QR decomposition is pretty good here, yes.
>>> --
>>> Sean Owen | Director, Data Science | London
>>> 
>>> 
>>> On Thu, Mar 6, 2014 at 3:05 PM, Debasish Das <debasish.das83@gmail.com>
>> wrote:
>>>> Hi Sebastian,
>>>> 
>>>> Yes Mahout ALS and Oryx runs fine on the same matrix because Sean calls
>> QR
>>>> decomposition.
>>>> 
>>>> But the ALS objective should give us strictly positive definite
>> matrix..I
>>>> am thinking more on it..
>>>> 
>>>> There are some random factor assignment step but that also initializes
>>>> factors with normal(0,1)...which I think is not a big deal...
>>>> 
>>>> About QR decomposition, jblas has Solve.solve and
>>>> Solve.solvePositive...solve should also run fine but it does a LU
>>>> factorization and better way will be to do a QR decomposition.
>>>> 
>>>> Seems breeze has QR decomposition and we can make use of that...But QR
>> by
>>>> default is also not correct since if the matrix is positive definite
>> BLAS
>>>> psov (Solve.solvePositive) is much faster due to cholesky computation..
>>>> 
>>>> I believe we need a singular value check and based on that we should
>> call
>>>> solvePositive/solve or qr from breeze..
>>>> 
>>>> There is also a specialized version of TSQR (tall and skinny QR
>>>> decomposition) from Chris which might be good to evaluate as well:
>>>> 
>>>> https://github.com/ccsevers/scalding-linalg
>>>> 
>>>> I am going to debug it further and try to understand why I am getting
>> non
>>>> positive definite matrix and publish the findings...
>>>> 
>>>> Any suggestions on how to proceed further on this ? Should I ask for a
>> PR
>>>> and we can discuss more on it ?
>>>> 
>>>> Thanks.
>>>> Deb
>>>> 
>>>> On Thu, Mar 6, 2014 at 6:47 AM, Sebastian Schelter <ssc@apache.org>
>> wrote:
>>>> 
>>>>> I'm not sure about the mathematical details, but I found in some
>>>>> experiments with Mahout that the matrix there was also not positive
>>>>> definite. Therefore, we chose QR decomposition to solve the linear
>> system.
>>>>> 
>>>>> 
>>>>> --sebastian
>>>>> 
>>>>> 
>>>>> On 03/06/2014 03:44 PM, Debasish Das wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I am running ALS on a sparse problem (10M x 1M) and I am getting
the
>>>>>> following error:
>>>>>> 
>>>>>> org.jblas.exceptions.LapackArgumentException: LAPACK DPOSV: Leading
>> minor
>>>>>> of order i of A is not positive definite.
>>>>>> at org.jblas.SimpleBlas.posv(SimpleBlas.java:373)
>>>>>> at org.jblas.Solve.solvePositive(Solve.java:68)
>>>>>> 
>>>>>> This error from blas shows up if the hessian matrix is not positive
>>>>>> definite...
>>>>>> 
>>>>>> I checked that rating matrix is all > 0 but of course like netflix
>> they
>>>>>> are
>>>>>> not bounded within 1 and 5....
>>>>>> 
>>>>>> Is there some sort of specific initialization of factor matrices
done
>>>>>> which
>>>>>> can make the hessian matrix non positive definite ?
>>>>>> 
>>>>>> I am printing out the eigen vectors and fullXtX matrix to understand
>> it
>>>>>> more but any help will be appreciated.
>>>>>> 
>>>>>> Thanks.
>>>>>> Deb
>>>>>> 
>>>>>> 
>>>>> 
>> 
>> 


Mime
View raw message