Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9EE54E0D4 for ; Wed, 16 Jan 2013 11:16:39 +0000 (UTC) Received: (qmail 44805 invoked by uid 500); 16 Jan 2013 11:16:38 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 44620 invoked by uid 500); 16 Jan 2013 11:16:37 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 44309 invoked by uid 99); 16 Jan 2013 11:16:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 11:16:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of srowen@gmail.com designates 209.85.223.181 as permitted sender) Received: from [209.85.223.181] (HELO mail-ie0-f181.google.com) (209.85.223.181) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2013 11:16:30 +0000 Received: by mail-ie0-f181.google.com with SMTP id 16so2197145iea.12 for ; Wed, 16 Jan 2013 03:16:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=6fHiq6HBOMTjSHXxMAtsTT8IVo1LGDdidDrUfFV5eCU=; b=pMDjoKUBCmfrOwEMF0CexNT9R0jtNA1ZjlTytOooHc8iyFMTm88WcXysSn4Z+ad8OH EB/F4vyO907ddL/NRAXCXAOIusd6y9EogQdUGaa2enxJ/BLvcB4WGtVvDl68E4L7mP+G V+KFsWKmVkHByPpV86brKKxa5DjA6Xhf05KFscL6b0OjoXF5k//F9fttxoULeQ9q3/cY qheK5ULXc0yz9rhF+MO54dRb0jkTGb5Vm7kpgOz2aptW8qvPB9Rbf/9zI1KnPBdoVHWX ecEm5PpjhI3Tq3heZhUUpGnHRU7FavQ9JOPhaMApLYLzfGxWqldLypSv3olVbae8XrRD fOQg== MIME-Version: 1.0 X-Received: by 10.43.114.4 with SMTP id ey4mr439968icc.27.1358334968955; Wed, 16 Jan 2013 03:16:08 -0800 (PST) Received: by 10.50.152.198 with HTTP; Wed, 16 Jan 2013 03:16:08 -0800 (PST) Received: by 10.50.152.198 with HTTP; Wed, 16 Jan 2013 03:16:08 -0800 (PST) In-Reply-To: <50CFD234CC7D3A4EA1E8910D3866F700095256FD15@NDA-HCLC-EVS02.HCLC.CORP.HCL.IN> References: <50CFD234CC7D3A4EA1E8910D3866F700095256FBA9@NDA-HCLC-EVS02.HCLC.CORP.HCL.IN> <50CFD234CC7D3A4EA1E8910D3866F700095256FD15@NDA-HCLC-EVS02.HCLC.CORP.HCL.IN> Date: Wed, 16 Jan 2013 11:16:08 +0000 Message-ID: Subject: RE: MatrixMultiplicationJob runs with 1 mapper only ? From: Sean Owen To: Mahout User List Content-Type: multipart/alternative; boundary=bcaec5171a474f843d04d3660141 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5171a474f843d04d3660141 Content-Type: text/plain; charset=UTF-8 Why do you need multiple mappers? Is one too slow? Many are not necessarily faster for small input On Jan 16, 2013 10:46 AM, "Stuti Awasthi" wrote: > Hi, > I tried to call programmatically also but facing same issue : Only single > MapTask is running and that too spilling the map output continuously. > Hence im not able to generate the output for large matrix multiplication. > > Code Snippet : > > DistributedRowMatrix a = new DistributedRowMatrix(new > Path("/test/points/matrixA"), new > Path("/test/temp"),Integer.parseInt("100"), Integer.parseInt("100000")); > DistributedRowMatrix b = new DistributedRowMatrix(new > Path("/test/points/matrixA"),new Path("tempDir"),Integer.parseInt("100"), > Integer.parseInt("100000")); > Configuration conf = new Configuration(); > conf.set("fs.default.name", "hdfs://DS-1078D24B4736:10818"); > conf.set("mapred.child.java.opts", "-Xmx2048m"); > conf.set("mapred.max.split.size","10485760"); > a.setConf(conf); > b.setConf(conf); > a.times(b); > > Where Im going wrong. Any idea ? > > Thanks > Stuti > -----Original Message----- > From: Stuti Awasthi > Sent: Wednesday, January 16, 2013 2:55 PM > To: Mahout User List > Subject: RE: MatrixMultiplicationJob runs with 1 mapper only ? > > Hey Sean, > Thanks for response. MatrixMultiplicationJob help shows the usage like : > usage: [Generic Options] [Job-Specific Options] > > Here Generic Option can be provided by -D . Hence I tried > with commandline -D options but it seems like that it is not making any > effect. It is also suggested in : > > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/common/AbstractJob.html > > Here I have noted 1 thing after your suggestion that currently Im passing > arguments like -D rather than -D . I tried > with space between -D and property=value also but then its giving error > like: > 13/01/16 14:21:47 ERROR common.AbstractJob: Unexpected > /test/points/matrixA while processing Job-Specific Options: > > No such error comes if im passing the arguments without space between -D. > > By reference of Hadoop Definite Guide : "Do not confuse setting Hadoop > properties using the -D property=value option to GenericOptionsParser (and > ToolRunner) with setting JVM system properties using the > -Dproperty=value option to the java command. The syntax for JVM system > properties does not allow any whitespace between the D and the property > name, whereas GenericOptionsParser requires them to be separated by > whitespace." > > Hence I suppose that GenericOptions should be parsed by -D property=value > rather than -Dproperty=value. > > Additionally I tried -Dmapred.max.split.size=10485760 also through > commandline but again only single MapTask started. > > Please Suggest > > > -----Original Message----- > From: Sean Owen [mailto:srowen@gmail.com] > Sent: Wednesday, January 16, 2013 1:23 PM > To: Mahout User List > Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ? > > It's up to Hadoop in the end. > > Try calling FileInputFormat.setMaxInputSplitSize() with a smallish value, > like your 10MB (10000000). > > I don't know if Hadoop params can be set as sys properties like that > anyway? > > On Wed, Jan 16, 2013 at 7:48 AM, Stuti Awasthi > wrote: > > Hi, > > > > I am trying to multiple dense matrix of size [100 x 100k]. The size of > the file is 104MB and with default block sizeof 64MB only 2 blocks are > getting created. > > So I reduced the block size to 10MB and now my file divided into 11 > blocks across the cluster. Cluster size is 10 nodes with 1 NN/JT and 9 > DN/TT. > > > > Everytime Im running Mahout MatrixMultiplicationJob through commandline, > I can see on JobTracker WebUI that only 1 map task is launched. According > to my understanding of Inputsplit, there should be 11 map tasks launched. > > Apart from this Map task stays at 0.99% completion and in the Tasks Logs > , I can see that map task is spilling the map output. > > > > Mahout Command: > > > > mahout matrixmult -Dmapred.child.java.opts=-Xmx1024M > > -Dfs.inmemory.size.mb=200 -Dio.sort.factor=100 -Dio.sort.mb=200 > > -Dio.file.buffer.size=131072 --inputPathA /test/matrixA --numRowsA 100 > > --numColsA 100000 --inputPathB /test/matrixA --numRowsB 100 --numColsB > > 100000 --tempDir /test/temp > > > > Now here I want to know that why only 1 map task is launched everytime > and how can I performance tune the cluster so that I can perform the dense > matrix multiplication of the order [90K x 1 Million] . > > > > Thanks > > Stuti > > > > > > ::DISCLAIMER:: > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > -------- > > > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > > E-mail transmission is not guaranteed to be secure or error-free as > > information could be intercepted, corrupted, lost, destroyed, arrive > > late or incomplete, or may contain viruses in transmission. The e mail > and its contents (with or without referred errors) shall therefore not > attach any liability on the originator or HCL or its affiliates. > > Views or opinions, if any, presented in this email are solely those of > > the author and may not necessarily reflect the views or opinions of > > HCL or its affiliates. Any form of reproduction, dissemination, > > copying, disclosure, modification, distribution and / or publication of > this message without the prior written consent of authorized representative > of HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > > Before opening any email and/or attachments, please check them for > viruses and other defects. > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > -------- > --bcaec5171a474f843d04d3660141--