mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Errors in SSVD
Date Sat, 13 Aug 2011 20:37:12 GMT
Ok, i got u0 working.

The problem is of course that something called BBt job is to be coerced to
have 1 reducer (it's fine, every mapper won't yeld more than
upper-triangular matrix of k+p x k+p geometry, so even if you end up having
thousands of them, reducer would sum them up just fine.

it worked before apparently because configuration hold 1 reducer by default
if not set explicitly, i am not quite sure if that's something in hadoop mr
client or mahout change that now precludes it from working.

anyway, i got a patch (really a one-liner) and an example equivalent to
yours worked fine for me with 3 reducers.

Also, in the tests, it also requests 3 reducers, but the reason it works in
tests and not in distributed mapred is because local mapred doesn't support
multiple reducers. I investigated this issue before and apparently there
were a couple of patches floating around but for some reason those changes
did not take hold in cdh3u0.

I will publish patch in a jira shortly and will commit it Sunday-ish.

Thanks.
-d


On Fri, Aug 5, 2011 at 7:06 PM, Eshwaran Vijaya Kumar <
evijayakumar@mozilla.com> wrote:

> OK. So to add more info to this, I tried setting the number of reducers to
> 1 and now I don't get that particular error. The singular values and left
> and right singular vectors appear to be correct though (verified using
> Matlab).
>
> On Aug 5, 2011, at 1:55 PM, Eshwaran Vijaya Kumar wrote:
>
> > All,
> >  I am trying to test Stochastic SVD and am facing some errors where it
> would be great if  someone could clarifying what is going on. I am trying to
> feed the solver a DistributedRowMatrix with the exact same parameters that
> the test in  LocalSSVDSolverSparseSequentialTest uses, i.e, Generate a 1000
> X 100 DRM with SequentialSparseVectors and then ask for blockHeight 251, p
> (oversampling) = 60, k (rank) = 40. I get the following error:
> >
> > Exception in thread "main" java.io.IOException: Unexpected overrun in
> upper triangular matrix files
> >        at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.loadUpperTriangularMatrix(SSVDSolver.java:471)
> >        at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:268)
> >        at com.mozilla.SSVDCli.run(SSVDCli.java:89)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> >        at com.mozilla.SSVDCli.main(SSVDCli.java:129)
> >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> >
> > Also, I am using CDH3 with Mahout recompiled to work with CDH3 jars.
> >
> > Thanks
> > Esh
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message