mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: SSVD Wrong Singular Vectors
Date Mon, 03 Sep 2012 19:48:36 GMT
There is a question still i guess that we never looked at which is the
effect of different kind of distribution samplers for the Omega matrix
(0-mean uniform vs. normal vs. trinary) on small problems though,
although small matrices is not what Mahout is about.

On Mon, Sep 3, 2012 at 12:23 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> ok so i ran Ahmed's numbers with optimal, R sequential version (ssvd.R
> on the wiki) and actual MR. Bottom line, both R ssvd and MR are
> producing quite a bit of rubbish for these k,p parameters and the long
> tailed spectrum beyond just 1st singular vector. Obviously it is hard
> to assert the errors since i'd need a lot of runs to compare and each
> run creates different output.
>
> here we go (1st digits of first 4 singular vectors respectively)
>
> optimal
> s$u[1,1:4]
> [1] -0.09866006 -0.13195226 -0.16620985 -0.04921311
>
> ssvd.R
> s <- ssvd.svd(A,k=30,p=2,qiter=1)
>> ss$u[1,1:4]
> [1] -0.09869577 -0.10368010  0.09543850  0.14122410
>
> MR SSVD
>> U[1,1:4]
>          V2          V3          V4          V5
> -0.09865305 -0.21822886  0.05959496  0.06561257
>
> Of course both versions of SSVD produce exact result with k+p=100
> which means internal flow is correct. So i don't see necessary
> discrepancies in the results between MR and ssvd.R in that fairly
> corner case of things.
>
> So this just means that our choice of k,p and q is just too poor for
> such a flat spectrum and such a small matrix beyond the first singular
> vector. Can we do better? Sure. As mentioned, k=30,p=70 gives us exact
> optimal solution:
>
>> ss <- ssvd.svd(A,k=30,p=70,qiter=0)
>> ss$u[1,1:4]
> [1] -0.09866006  0.13195226  0.16620985 -0.04921311
>>
>
> So this is a rebuttal on the Ahmed's statement
>>  I tried multiple values for the parameters P and Q but, that does not seem to solve
>> the problem.
>
> No, it kinda does as demonstrated.
>
> Next. Can we try something in between to work such a bad case to get a
> better trade-off between errors and effort on precision side? sure.
>
> First, It doesn't make sense to use p<15. in fact p=15 is the default
> one for MR version. So: if you ran MR version with default parameters,
> it should have been something closer to this:
>
> (I'll just run algorithm several times for a more consistent
> assessment of error mean and variance)
>> ss <- ssvd.svd(A,k=30,p=15,qiter=1)
>> ss$u[1,1:4]
> [1] -0.09867691 -0.15782129 -0.14318662 -0.01915917
> [1] -0.09868661 -0.16606151 -0.14881755  0.06277614
> [1] -0.09865008 -0.18475458  0.10799953  0.03643756
> [1] -0.09862877  0.16593100  0.15187904 -0.04660871
> ...
> So this default set of parameters is quite an improvement although we
> probably still get enough error to the right hand of 4th singular
> vector.
> (i assume vector sign is not important, it means sign of V is flipped
> as well i guess).
>
>
> Finally, we can further try to improve with more power iterations.
> ss <- ssvd.svd(A,k=30,p=15,qiter=2)
> ss$u[1,1:4]
>
>
>> ss <- ssvd.svd(A,k=30,p=15,qiter=2)
>> ss$u[1,1:4]
> [1] -0.09866002 -0.13100879  0.16691754 -0.04754525
> [1] -0.09866011 -0.13027399 -0.16417364 -0.04453192
> [1] -0.09866004 -0.13453556 -0.16704920 -0.04517045
> [1] -0.09866006 -0.12905608  0.16230813 -0.04353147
>
> Now that adds even more stability in the 4th vector, doesn't it? The
> error mean is demonstrably decreased.
>
> Bottom line points.
>
> First, there's not much point to investigate corner cases -- small
> matrices, flat spectrum... You may force SSVD to produce exact
> solution though, but it kind of defeats the purpose, because you can
> as easily run full rank SVD on a matrix in matlab. Your parameters
> chosen still produce acceptable results for the first vector where
> spectrum is trendy but less so  where the spectrum is flat.
>
> Second, you can control your errors through oversampling (p) and power
> iterations (q) and you get much flexibility here. You balance
> precision with flops tradeoff.
>
> Third, in real life it doesn't make sense to have high accuracy
> measuring lfat spectrum. Flat spectrum just means you don't have
> trends in those directions, i.e. essentially a random noise. If you
> have random noise, direction of that noise is usually of little
> interest, but because spectrum (i.e. singular values) is measured
> better with the method, you just can state that you have mostly noise
> after certain n-th singular vector and chances are in some cases  you
> just wouldn't care. And if you do care, you still have a choice to
> work the oversampling and perhaps q iterations.
>
> Fourth, unless k+p=rank(A), each run will produce slightly different
> results and errors so it doesn't make sense to make any error
> comparison just between single runs of the variations. Instead, it
> makes sense to compare error mean and variations on a better number of
> runs.
>
> -d
>
> On Sun, Sep 2, 2012 at 12:00 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>> Did Ahmed even use a power iteration?
>>
>> On Sun, Sep 2, 2012 at 1:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>>> but there is still a concern in a sense that power iterations
>>> should've helped more than they did. I'll take a closer look but it
>>> will take me a while to figure if there's something we can improve
>>> here.
>>>

Mime
View raw message