Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E687D217 for ; Fri, 24 May 2013 01:27:44 +0000 (UTC) Received: (qmail 72756 invoked by uid 500); 24 May 2013 01:27:42 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 72706 invoked by uid 500); 24 May 2013 01:27:42 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 72697 invoked by uid 99); 24 May 2013 01:27:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 May 2013 01:27:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-ie0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 May 2013 01:27:37 +0000 Received: by mail-ie0-f179.google.com with SMTP id c13so10372490ieb.24 for ; Thu, 23 May 2013 18:27:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=Wf6QqHj0/vZtDz/HXjhe033pllwSJ3mVSOVKvisYoOY=; b=NDB/XYPycGbfu4kkLCShTztvwGbqIXKyY4iqdv24Bc+PHibupOx5qvluh3M5Rz0BEd Y1cPGLWyg0ESjXYzgflMwdHp4tqKQVNZCVzqhGeCdh2ppeA8SxY5Z0CeRG/H7+vpt+zh p86DwcOjbM8sXftfRmcGNZ1phpOpUHngO7F49HuvU7zTczyjznEMeb3L42kkl3WTqMMv gB08QSGHRehK2T6pNz1g38QoLse43Onbu9stLsNsbCgUm3US0ml1zXqd8R8JM9yXwTLi ktP8DQYgBmA8KJUxTg60Kj92OHCqUEuPotPuR03/ZMfVb8nSiHNNh/pEcEZsaJ7AfEI9 XD4g== X-Received: by 10.50.2.71 with SMTP id 7mr1369662igs.2.1369358836810; Thu, 23 May 2013 18:27:16 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.235.228 with HTTP; Thu, 23 May 2013 18:26:46 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Thu, 23 May 2013 18:26:46 -0700 Message-ID: Subject: Re: SSVD outputs different things vs R version of SVD To: "user@mahout.apache.org" Content-Type: multipart/alternative; boundary=089e0118235809d42704dd6cb37b X-Virus-Checked: Checked by ClamAV on apache.org --089e0118235809d42704dd6cb37b Content-Type: text/plain; charset=UTF-8 The SVD of a matrix is not unique. You can change the sign and rearrange the singular values at well. Customary practice is to order by the square of the singular value, but that doesn't make the SVD unique. Regarding the number of singular values, R's svd routine computes all of the singular values. The nu and nv parameters that you are setting control the number of singular VECTORS that are computed, not the number of singular VALUES. If you want to experiment, here is an in-memory implementation of stochastic SVD in R. This lets you play with various combinations of parameters. incore = function(A, k, p = 10, q = 2) { if (q > 0) { Z = A for (i in 1:q) { Z = Z %*% t(A) %*% A } A = Z } n = dim(A)[1] m = dim(A)[2] Y = A %*% matrix(rnorm((k+p) * m), ncol=k+p) Q = qr.Q(qr(Y)) rm(Y) B = t(Q) %*% A lq = qr(t(B)) L = t(qr.R(lq)) s = svd(L) U = Q %*% s$u V = qr.Q(lq) %*% s$v return (list(u=U, v=V, d=(s$d^(1/(2*q+1))))) } In order to produce interesting data for this, I recommend something like this: A = matrix(rnorm(1000*1000), ncol=1000) for (i in 1:100) {A[,i] = i^4 * A[,i]} plot(svd(A)$d[1:30]) A = A/1e9 The idea here is that you want a range of singular values. Using this, you can trade off the padding (p) versus the power iterations (q). This combination, for instance, give me errors of about 1e-13 versus the internal R algorithm. s = svd(A)$d[1:20] plot(s-incore(A,k=20,p=55,q=1)$d[1:20]) On Thu, May 23, 2013 at 3:32 PM, Andrew Musselman < andrew.musselman@gmail.com> wrote: > Wouldn't I expect to get similar results using Mahout's SSVD vs. R's SVD? > > Note the second component of each vector in U and V is the negative of what > R gives me. Also, R includes a third singular value even when I ask it to > calculate a rank-2 decomposition. > > The output of Mahout's SSVD run on the 3x3 matrix > $ cat a > 1 (0.0,0.25,0.25) > 2 (0.75,0.0,0.25) > 3 (0.25,0.75,0.5) > > $ mahout ssvd -k 2 -p 1 -q 1 --input kv-pairs --output ssvd-out --tempDir > tmp-ssvd-2 --reduceTasks 1 > $ mahout seqdumper -i ssvd-out/U -o ssvd-dump-U -b 200 > $ mahout seqdumper -i ssvd-out/V -o ssvd-dump-V -b 200 > $ mahout seqdumper -i ssvd-out/sigma -o ssvd-dump-sigma -b 200 > > $ cat ssvd-dump-U; cat ssvd-dump-V; cat ssvd-dump-sigma > Input Path: hdfs://localhost:9010/user/akm/ssvd-out/U/part-m-00000 > Key class: class org.apache.hadoop.io.IntWritable Value Class: class > org.apache.mahout.math.VectorWritable > Key: 1: Value: {0:-0.27511654723856177,1:-0.2590650410646752} > Key: 2: Value: {0:-0.5012740900141649,1:0.8604052567841447} > Key: 3: Value: {0:-0.8203872086496734,1:-0.43884860555363264} > Count: 3 > Input Path: hdfs://localhost:9010/user/akm/ssvd-out/V/part-m-00000 > Key class: class org.apache.hadoop.io.IntWritable Value Class: class > org.apache.mahout.math.VectorWritable > Key: 0: Value: {0:-0.5370130951532543,1:0.8012749902922572} > Key: 1: Value: {0:-0.6322223639715111,1:-0.5893002821703531} > Key: 2: Value: {0:-0.5584906607349807,1:-0.10336134367394931} > Count: 3 > Input Path: ssvd-out/sigma > Key class: class org.apache.hadoop.io.IntWritable Value Class: class > org.apache.mahout.math.VectorWritable > Key: 0: Value: {0:1.0820078223739025,1:0.6684244456504859} > Count: 1 > > Versus the output of R's SVD run on the same 3x3 matrix > > mp > [,1] [,2] [,3] > [1,] 0.00 0.25 0.25 > [2,] 0.75 0.00 0.25 > [3,] 0.25 0.75 0.50 > > > s <- svd(mp,2,2) > > s > $d > [1] 1.08200782 0.66842445 0.08641662 > > $u > [,1] [,2] > [1,] -0.2751165 0.2590650 > [2,] -0.5012741 -0.8604053 > [3,] -0.8203872 0.4388486 > > $v > [,1] [,2] > [1,] -0.5370131 -0.8012750 > [2,] -0.6322224 0.5893003 > [3,] -0.5584907 0.1033613 > --089e0118235809d42704dd6cb37b--