mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Errors in SSVD
Date Sat, 13 Aug 2011 21:28:37 GMT
Dmitriy,

I have had some thoughts on this code and I think it is possible to
eliminate the progressive QR decomposition of Y entirely and gain
significant speed.  I have some preliminary sequential code but need to work
up a larger example and a parallel implementation.

On Sat, Aug 13, 2011 at 2:11 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> NP.
>
> thanks for testing it out.
>
> I would appreciate if you could let me know how it goes with non-full rank
> decomposition and perhaps at larger scale.
>
> One thing to keep in mind is that it projects it into m x k+p _dense_
> matrix, assuming that k+p is much less than non-zero elements in a sparse
> row vector. If it is not the case, you actually would create more
> computation, not less, with a random projection. One person tried to use it
> with m= millions but rows were so sparse that there were only a handful
> (~10
> avg) non-zero items per row (somewhat typical for user ratings), but he
> tried to compute actually hundreds of singular values which of course
> created more intermediate work than something like Lanczos would probably
> do. That's not a good application of this method.
>
> Another thing is also that you need to have good singular value decay in
> your data, otherwise this methods would be surprisingly far from true
> vectors (in my experiments).
>
> -d
>
>
> On Sat, Aug 13, 2011 at 1:48 PM, Eshwaran Vijaya Kumar <
> evijayakumar@mozilla.com> wrote:
>
> > Dmitriy,
> >  That sounds great. I eagerly await the patch.
> > Thanks
> > Esh
> > On Aug 13, 2011, at 1:37 PM, Dmitriy Lyubimov wrote:
> >
> > > Ok, i got u0 working.
> > >
> > > The problem is of course that something called BBt job is to be coerced
> > to
> > > have 1 reducer (it's fine, every mapper won't yeld more than
> > > upper-triangular matrix of k+p x k+p geometry, so even if you end up
> > having
> > > thousands of them, reducer would sum them up just fine.
> > >
> > > it worked before apparently because configuration hold 1 reducer by
> > default
> > > if not set explicitly, i am not quite sure if that's something in
> hadoop
> > mr
> > > client or mahout change that now precludes it from working.
> > >
> > > anyway, i got a patch (really a one-liner) and an example equivalent to
> > > yours worked fine for me with 3 reducers.
> > >
> > > Also, in the tests, it also requests 3 reducers, but the reason it
> works
> > in
> > > tests and not in distributed mapred is because local mapred doesn't
> > support
> > > multiple reducers. I investigated this issue before and apparently
> there
> > > were a couple of patches floating around but for some reason those
> > changes
> > > did not take hold in cdh3u0.
> > >
> > > I will publish patch in a jira shortly and will commit it Sunday-ish.
> > >
> > > Thanks.
> > > -d
> > >
> > >
> > > On Fri, Aug 5, 2011 at 7:06 PM, Eshwaran Vijaya Kumar <
> > > evijayakumar@mozilla.com> wrote:
> > >
> > >> OK. So to add more info to this, I tried setting the number of
> reducers
> > to
> > >> 1 and now I don't get that particular error. The singular values and
> > left
> > >> and right singular vectors appear to be correct though (verified using
> > >> Matlab).
> > >>
> > >> On Aug 5, 2011, at 1:55 PM, Eshwaran Vijaya Kumar wrote:
> > >>
> > >>> All,
> > >>> I am trying to test Stochastic SVD and am facing some errors where
it
> > >> would be great if  someone could clarifying what is going on. I am
> > trying to
> > >> feed the solver a DistributedRowMatrix with the exact same parameters
> > that
> > >> the test in  LocalSSVDSolverSparseSequentialTest uses, i.e, Generate a
> > 1000
> > >> X 100 DRM with SequentialSparseVectors and then ask for blockHeight
> 251,
> > p
> > >> (oversampling) = 60, k (rank) = 40. I get the following error:
> > >>>
> > >>> Exception in thread "main" java.io.IOException: Unexpected overrun
in
> > >> upper triangular matrix files
> > >>>       at
> > >>
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.loadUpperTriangularMatrix(SSVDSolver.java:471)
> > >>>       at
> > >>
> >
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:268)
> > >>>       at com.mozilla.SSVDCli.run(SSVDCli.java:89)
> > >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> > >>>       at com.mozilla.SSVDCli.main(SSVDCli.java:129)
> > >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>>       at
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >>>       at
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> > >>>       at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> > >>>
> > >>> Also, I am using CDH3 with Mahout recompiled to work with CDH3 jars.
> > >>>
> > >>> Thanks
> > >>> Esh
> > >>>
> > >>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message