mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: SSVD outputs different things vs R version of SVD
Date Fri, 24 May 2013 01:26:46 GMT
The SVD of a matrix is not unique.  You can change the sign and rearrange
the singular values at well.  Customary practice is to order by the square
of the singular value, but that doesn't make the SVD unique.

Regarding the number of singular values, R's svd routine computes all of
the singular values.  The nu and nv parameters that you are setting control
the number of singular VECTORS that are computed, not the number of
singular VALUES.

If you want to experiment, here is an in-memory implementation of
stochastic SVD in R.  This lets you play with various combinations of
parameters.

incore = function(A, k, p = 10, q = 2) {
  if (q > 0) {
    Z = A
    for (i in 1:q) {
      Z = Z %*% t(A) %*% A
    }
    A = Z
  }
  n = dim(A)[1]
  m = dim(A)[2]
  Y = A %*% matrix(rnorm((k+p) * m), ncol=k+p)
  Q = qr.Q(qr(Y))
  rm(Y)
  B = t(Q) %*% A
  lq = qr(t(B))
  L = t(qr.R(lq))
  s = svd(L)
  U = Q %*% s$u
  V = qr.Q(lq) %*% s$v
  return (list(u=U, v=V, d=(s$d^(1/(2*q+1)))))
}


In order to produce interesting data for this, I recommend something like
this:

A = matrix(rnorm(1000*1000), ncol=1000)
for (i in 1:100) {A[,i] = i^4 * A[,i]}
plot(svd(A)$d[1:30])
A = A/1e9

The idea here is that you want a range of singular values.

Using this, you can trade off the padding (p) versus the power iterations
(q).

This combination, for instance, give me errors of about 1e-13 versus the
internal R algorithm.

s = svd(A)$d[1:20]
plot(s-incore(A,k=20,p=55,q=1)$d[1:20])





On Thu, May 23, 2013 at 3:32 PM, Andrew Musselman <
andrew.musselman@gmail.com> wrote:

> Wouldn't I expect to get similar results using Mahout's SSVD vs. R's SVD?
>
> Note the second component of each vector in U and V is the negative of what
> R gives me.  Also, R includes a third singular value even when I ask it to
> calculate a rank-2 decomposition.
>
> The output of Mahout's SSVD run on the 3x3 matrix
> $ cat a
> 1 (0.0,0.25,0.25)
> 2 (0.75,0.0,0.25)
> 3 (0.25,0.75,0.5)
>
> $ mahout ssvd -k 2 -p 1 -q 1 --input kv-pairs --output ssvd-out --tempDir
> tmp-ssvd-2 --reduceTasks 1
> $ mahout seqdumper -i ssvd-out/U -o ssvd-dump-U -b 200
> $ mahout seqdumper -i ssvd-out/V -o ssvd-dump-V -b 200
> $ mahout seqdumper -i ssvd-out/sigma -o ssvd-dump-sigma -b 200
>
> $ cat ssvd-dump-U; cat ssvd-dump-V; cat ssvd-dump-sigma
> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/U/part-m-00000
> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
> org.apache.mahout.math.VectorWritable
> Key: 1: Value: {0:-0.27511654723856177,1:-0.2590650410646752}
> Key: 2: Value: {0:-0.5012740900141649,1:0.8604052567841447}
> Key: 3: Value: {0:-0.8203872086496734,1:-0.43884860555363264}
> Count: 3
> Input Path: hdfs://localhost:9010/user/akm/ssvd-out/V/part-m-00000
> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
> org.apache.mahout.math.VectorWritable
> Key: 0: Value: {0:-0.5370130951532543,1:0.8012749902922572}
> Key: 1: Value: {0:-0.6322223639715111,1:-0.5893002821703531}
> Key: 2: Value: {0:-0.5584906607349807,1:-0.10336134367394931}
> Count: 3
> Input Path: ssvd-out/sigma
> Key class: class org.apache.hadoop.io.IntWritable Value Class: class
> org.apache.mahout.math.VectorWritable
> Key: 0: Value: {0:1.0820078223739025,1:0.6684244456504859}
> Count: 1
>
> Versus the output of R's SVD run on the same 3x3 matrix
> > mp
>      [,1] [,2] [,3]
> [1,] 0.00 0.25 0.25
> [2,] 0.75 0.00 0.25
> [3,] 0.25 0.75 0.50
>
> > s <- svd(mp,2,2)
> > s
> $d
> [1] 1.08200782 0.66842445 0.08641662
>
> $u
>            [,1]       [,2]
> [1,] -0.2751165  0.2590650
> [2,] -0.5012741 -0.8604053
> [3,] -0.8203872  0.4388486
>
> $v
>            [,1]       [,2]
> [1,] -0.5370131 -0.8012750
> [2,] -0.6322224  0.5893003
> [3,] -0.5584907  0.1033613
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message