mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: NaN produced by SSVD ?
Date Mon, 03 Nov 2014 22:00:37 GMT
it does have something to do with K. previously I used a formular to
determine my rank to use by

rank = N - p - 1 = 64 - 5 -1   = 58 , where N is the number of columns of
the original matrix.

then I tried using rank = 50, it worked.

well.... as I write this email, I realized that the reason might be that
the actual rank R of the original matrix may be much smaller than N, that
could be the reason. but it is a bit difficult to figure out that R
beforehand.


thanks
Yang

On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> is the matrix by any chance constructed so that it may have rank < k? I
> think MR code is not checking for that.
>
> In spark shell i have :
>
> mahout> val a = dense( (0,0),(0,0) )
> a: org.apache.mahout.math.DenseMatrix =
> {
>   0  => {}
>   1  => {}
> }
> mahout> svd(a)
> res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> org.apache.mahout.math.DenseVector) =
> ({
>   0  => {0:1.0}
>   1  => {1:1.0}
> },{
>   0  => {0:-1.0}
>   1  => {1:-1.0}
> },{})
>
> But :
>
> mahout> ssvd(a,2,0)
>
> java.lang.AssertionError: assertion failed: Rank-deficiency detected during
> s-SVD
>
> or
> mahout> val drmA = drmParallelize(a,2)
> mahout> dssvd(drmA, k=2)
> java.lang.IllegalArgumentException: R is rank-deficient.
>
>
> the MR version doesn't check for these effects and it may create some
> degenerate results, although i thought those should be 0s, at least when
> -q=0. I am not sure for -q=1,2...
>
>
>
>
> On Thu, Oct 30, 2014 at 10:35 PM, Yang <teddyyyy123@gmail.com> wrote:
>
> > i am talking about the MR one.
> >
> > thanks
> > yang
> > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dlieu.7@gmail.com> wrote:
> >
> > > This is not a known problem...
> > >
> > > there are few ssvd here, sequential, MR and spark one. for the record,
> > > which one are you running?
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <teddyyyy123@gmail.com> wrote:
> > >
> > > > we are running ssvd on a dataset (this one is relatively small, with
> > 8000
> > > > rows, number of columns is 64 ),  we ran it with rank = 58, since
> > > sampling
> > > > p=5.
> > > >
> > > > the result had NaN on multiple columns.
> > > >
> > > > why would this appear ?
> > > >
> > > > I am now running with lower rank=20 , to see if it goes away.
> > > >
> > > >
> > > > Thanks
> > > > Yang
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message