mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Dimensional Reduction via Random Projection: investigations
Date Sat, 02 Jul 2011 00:25:43 GMT
Here is R code that demonstrates what I mean by stunning (aka 15 significant
figures).  Note that I only check dot products here.  From the fact that the
final transform is orthonormal we know that all distances are preserved.

# make a big random matrix with rank 20
> a = matrix(rnorm(20000), ncol=20) %*% matrix(rnorm(20000), nrow=20);
> dim(a)
[1] 1000 1000
# random projection
> y = a %*% matrix(rnorm(30000), ncol=30);
# get basis for y
> q = qr.Q(qr(y))
> dim(q)
[1] 1000   30
# re-project b, do svd on result
> b = t(q) %*% a
> v = svd(b)$v
> d = svd(b)$d
# note how singular values drop like a stone at index 21
> plot(d)
> dim(v)
[1] 1000   30
# finish svd just for fun
> u = q %*% svd(b)$u
> dim(u)
[1] 1000   30
# u is orthogonal, right?
> diag(t(u)%*% u)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
# and u diag(d) v' reconstructs a very precisely, right?
> max(abs(a-u %*% diag(d) %*% t(v)))
[1] 1.293188e-12

# now project a into the reduced dimensional space
> aa = a%*%v
> dim(aa)
[1] 1000   30
# check a few dot products
> sum(aa[1,] %*% aa[2,])
[1] 6835.152
> sum(a[1,] %*% a[2,])
[1] 6835.152
> sum(a[1,] %*% a[3,])
[1] 3337.248
> sum(aa[1,] %*% aa[3,])
[1] 3337.248

# wow, that's close.  let's try a hundred dot products
> dot1 = rep(0,100);dot2 = rep(0,100)
> for (i in 1:100) {dot1[i] = sum(a[1,] * a[i,]); dot2[i] = sum(aa[1,]*
aa[i,])}

# how close to the same are those?
> max(abs(dot1-dot2))
# VERY
[1] 3.45608e-11



On Fri, Jul 1, 2011 at 4:54 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Yes.  Been there.  Done that.
>
> The correlation is stunningly good.
>
>
> On Fri, Jul 1, 2011 at 4:22 PM, Lance Norskog <goksron@gmail.com> wrote:
>
>> Thanks!
>>
>> Is this true? - "Preserving pairwise distances" means the relative
>> distances. So the ratios of new distance:old distance should be
>> similar. The standard deviation of the ratios gives a rough&ready
>> measure of the fidelity of the reduction. The standard deviation of
>> simple RP should be highest, then this RP + orthogonalization, then
>> MDS.
>>
>> On Fri, Jul 1, 2011 at 11:03 AM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>> > Lance,
>> >
>> > You would get better results from the random projection if you did the
>> first
>> > part of the stochastic SVD.  Basically, you do the random projection:
>> >
>> >       Y = A \Omega
>> >
>> > where A is your original data, R is the random matrix and Y is the
>> result.
>> >  Y will be tall and skinny.
>> >
>> > Then, find an orthogonal basis Q of Y:
>> >
>> >       Q R = Y
>> >
>> > This orthogonal basis will be very close to the orthogonal basis of A.
>>  In
>> > fact, there are strong probabilistic guarantees on how good Q is as a
>> basis
>> > of A.  Next, you project A using the transpose of Q:
>> >
>> >       B = Q' A
>> >
>> > This gives you a short fat matrix that is the projection of A into a
>> lower
>> > dimensional space.  Since this is a left projection, it isn't quite what
>> you
>> > want in your work, but it is the standard way to phrase things.  The
>> exact
>> > same thing can be done with left random projection:
>> >
>> >      Y = \Omega A
>> >      L Q = Y
>> >      B = A Q'
>> >
>> > In this form, B is tall and skinny as you would like and Q' is
>> essentially
>> > an orthogonal reformulation of of the random projection.  This
>> projection is
>> > about as close as you are likely to get to something that exactly
>> preserves
>> > distances.  As such, you should be able to use MDS on B to get exactly
>> the
>> > same results as you want.
>> >
>> > Additionally, if you start with the original form and do an SVD of B
>> (which
>> > is fast), you will get a very good approximation of the prominent right
>> > singular vectors of A.  IF you do that, the first few of these should be
>> > about as good as MDS for visualization purposes.
>> >
>> > On Fri, Jul 1, 2011 at 2:44 AM, Lance Norskog <goksron@gmail.com>
>> wrote:
>> >
>> >> I did some testing and make a lot of pretty charts:
>> >>
>> >> http://ultrawhizbang.blogspot.com/
>> >>
>> >> If you want to get quick visualizations of your clusters, this is a
>> >> great place to start.
>> >>
>> >> --
>> >> Lance Norskog
>> >> goksron@gmail.com
>> >>
>> >
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message