mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject RE: MatrixMultiplicationJob runs with 1 mapper only ?
Date Wed, 16 Jan 2013 11:16:08 GMT
Why do you need multiple mappers? Is one too slow? Many are not necessarily
faster for small input
On Jan 16, 2013 10:46 AM, "Stuti Awasthi" <stutiawasthi@hcl.com> wrote:

> Hi,
> I tried to call programmatically also but facing same issue : Only single
> MapTask is running and that too spilling the map output  continuously.
> Hence im not able to generate the output for large matrix multiplication.
>
> Code Snippet :
>
> DistributedRowMatrix a = new DistributedRowMatrix(new
> Path("/test/points/matrixA"), new
> Path("/test/temp"),Integer.parseInt("100"), Integer.parseInt("100000"));
> DistributedRowMatrix b = new DistributedRowMatrix(new
> Path("/test/points/matrixA"),new Path("tempDir"),Integer.parseInt("100"),
> Integer.parseInt("100000"));
> Configuration conf = new Configuration();
> conf.set("fs.default.name", "hdfs://DS-1078D24B4736:10818");
> conf.set("mapred.child.java.opts", "-Xmx2048m");
> conf.set("mapred.max.split.size","10485760");
> a.setConf(conf);
> b.setConf(conf);
> a.times(b);
>
> Where Im going wrong. Any idea ?
>
> Thanks
> Stuti
> -----Original Message-----
> From: Stuti Awasthi
> Sent: Wednesday, January 16, 2013 2:55 PM
> To: Mahout User List
> Subject: RE: MatrixMultiplicationJob runs with 1 mapper only ?
>
> Hey Sean,
> Thanks for response. MatrixMultiplicationJob help shows the usage like :
> usage: <command> [Generic Options] [Job-Specific Options]
>
> Here Generic Option can be provided by -D <property=value>. Hence I tried
> with commandline -D options but it seems like that it is not making any
> effect.  It is also suggested in :
>
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/common/AbstractJob.html
>
> Here I have noted 1 thing after your suggestion  that currently Im passing
> arguments like -D<property=value> rather than -D <property=value>. I tried
> with space between -D and property=value also but then its giving error
> like:
> 13/01/16 14:21:47 ERROR common.AbstractJob: Unexpected
> /test/points/matrixA while processing Job-Specific Options:
>
> No such error comes if im passing the arguments without space between -D.
>
> By reference of Hadoop Definite Guide : "Do not confuse setting Hadoop
> properties using the -D property=value option to GenericOptionsParser (and
> ToolRunner) with setting JVM system properties using the
> -Dproperty=value option to the java command. The syntax for JVM system
> properties does not allow any whitespace between the D and the property
> name, whereas GenericOptionsParser requires them to be separated by
> whitespace."
>
> Hence I suppose that GenericOptions should be parsed by -D property=value
> rather than -Dproperty=value.
>
> Additionally I tried -Dmapred.max.split.size=10485760 also through
> commandline but again only single MapTask started.
>
> Please Suggest
>
>
> -----Original Message-----
> From: Sean Owen [mailto:srowen@gmail.com]
> Sent: Wednesday, January 16, 2013 1:23 PM
> To: Mahout User List
> Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ?
>
> It's up to Hadoop in the end.
>
> Try calling FileInputFormat.setMaxInputSplitSize() with a smallish value,
> like your 10MB (10000000).
>
> I don't know if Hadoop params can be set as sys properties like that
> anyway?
>
> On Wed, Jan 16, 2013 at 7:48 AM, Stuti Awasthi <stutiawasthi@hcl.com>
> wrote:
> > Hi,
> >
> > I am trying to multiple dense matrix of size [100 x 100k]. The size of
> the file is 104MB and with default block sizeof 64MB only 2 blocks are
> getting created.
> > So I reduced the block size to 10MB and now my file divided into 11
> blocks across the cluster. Cluster size is 10 nodes with 1 NN/JT and 9
> DN/TT.
> >
> > Everytime Im running Mahout MatrixMultiplicationJob through commandline,
> I can see on JobTracker WebUI that only 1 map task is launched. According
> to my understanding of Inputsplit, there should be 11 map tasks launched.
> > Apart from this Map task stays at 0.99% completion and in the Tasks Logs
> , I can see that map task is spilling the map output.
> >
> > Mahout Command:
> >
> > mahout matrixmult -Dmapred.child.java.opts=-Xmx1024M
> > -Dfs.inmemory.size.mb=200 -Dio.sort.factor=100 -Dio.sort.mb=200
> > -Dio.file.buffer.size=131072 --inputPathA /test/matrixA --numRowsA 100
> > --numColsA 100000 --inputPathB /test/matrixA --numRowsB 100 --numColsB
> > 100000 --tempDir /test/temp
> >
> > Now here I want to know that why only 1 map task is launched everytime
> and how can I performance tune the cluster so that I can perform the dense
> matrix multiplication of the order [90K x 1 Million] .
> >
> > Thanks
> > Stuti
> >
> >
> > ::DISCLAIMER::
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > --------
> >
> > The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as
> > information could be intercepted, corrupted, lost, destroyed, arrive
> > late or incomplete, or may contain viruses in transmission. The e mail
> and its contents (with or without referred errors) shall therefore not
> attach any liability on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those of
> > the author and may not necessarily reflect the views or opinions of
> > HCL or its affiliates. Any form of reproduction, dissemination,
> > copying, disclosure, modification, distribution and / or publication of
> this message without the prior written consent of authorized representative
> of HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> > Before opening any email and/or attachments, please check them for
> viruses and other defects.
> >
> > ----------------------------------------------------------------------
> > ----------------------------------------------------------------------
> > --------
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message