it does have something to do with K. previously I used a formular to
determine my rank to use by
rank = N  p  1 = 64  5 1 = 58 , where N is the number of columns of
the original matrix.
then I tried using rank = 50, it worked.
well.... as I write this email, I realized that the reason might be that
the actual rank R of the original matrix may be much smaller than N, that
could be the reason. but it is a bit difficult to figure out that R
beforehand.
thanks
Yang
On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> is the matrix by any chance constructed so that it may have rank < k? I
> think MR code is not checking for that.
>
> In spark shell i have :
>
> mahout> val a = dense( (0,0),(0,0) )
> a: org.apache.mahout.math.DenseMatrix =
> {
> 0 => {}
> 1 => {}
> }
> mahout> svd(a)
> res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> org.apache.mahout.math.DenseVector) =
> ({
> 0 => {0:1.0}
> 1 => {1:1.0}
> },{
> 0 => {0:1.0}
> 1 => {1:1.0}
> },{})
>
> But :
>
> mahout> ssvd(a,2,0)
>
> java.lang.AssertionError: assertion failed: Rankdeficiency detected during
> sSVD
>
> or
> mahout> val drmA = drmParallelize(a,2)
> mahout> dssvd(drmA, k=2)
> java.lang.IllegalArgumentException: R is rankdeficient.
>
>
> the MR version doesn't check for these effects and it may create some
> degenerate results, although i thought those should be 0s, at least when
> q=0. I am not sure for q=1,2...
>
>
>
>
> On Thu, Oct 30, 2014 at 10:35 PM, Yang <teddyyyy123@gmail.com> wrote:
>
> > i am talking about the MR one.
> >
> > thanks
> > yang
> > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dlieu.7@gmail.com> wrote:
> >
> > > This is not a known problem...
> > >
> > > there are few ssvd here, sequential, MR and spark one. for the record,
> > > which one are you running?
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <teddyyyy123@gmail.com> wrote:
> > >
> > > > we are running ssvd on a dataset (this one is relatively small, with
> > 8000
> > > > rows, number of columns is 64 ), we ran it with rank = 58, since
> > > sampling
> > > > p=5.
> > > >
> > > > the result had NaN on multiple columns.
> > > >
> > > > why would this appear ?
> > > >
> > > > I am now running with lower rank=20 , to see if it goes away.
> > > >
> > > >
> > > > Thanks
> > > > Yang
> > > >
> > >
> >
>
