From mahout-user-return-3101-apmail-lucene-mahout-user-archive=lucene.apache.org@lucene.apache.org Mon Apr 12 11:16:19 2010 Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 93361 invoked from network); 12 Apr 2010 11:16:19 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 12 Apr 2010 11:16:19 -0000 Received: (qmail 70453 invoked by uid 500); 12 Apr 2010 11:16:18 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 70228 invoked by uid 500); 12 Apr 2010 11:16:16 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 70220 invoked by uid 99); 12 Apr 2010 11:16:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Apr 2010 11:16:16 +0000 X-ASF-Spam-Status: No, hits=2.5 required=10.0 tests=FREEMAIL_FROM,FREEMAIL_REPLY,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of vml.mathew@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-gw0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Apr 2010 11:16:09 +0000 Received: by gwaa12 with SMTP id a12so2709235gwa.35 for ; Mon, 12 Apr 2010 04:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=dENvE57L+UBtwY7X2Yi5Pc3NkXPFup3gMIzJx4bnsQ8=; b=RFG/3zyeXzTZ6cW+Ol+y/VHvTk0XtakX0k4/1U4gTtbWQOgGQ5hucK6uEA6oGaoV6v 8EUZGagbsI52AtHi4kWbEtaWlhiBtNU5raDKwugQQ7s3IUhrvNB4PH6SFEojGSH36B3t PsYbBBlKgHOm1PGvOJhLRDvy9IKxF8mY/ymNs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Ob25EE25+uGtdR6OzI+rG4tj+OJciKEmOOEHGNMKmLQfYVOKiK6T4uCc4IP4TDVXT1 wmJkcz2uwJenZ8WSwo3yKR9p9m9ckZx6KKzKUMG64A+tF4DjX1TaTAAk2Y6L5MtDXGxr Cn3kwCsY6QrauOU2PkVU0dGu6c/XbN2ykNN3k= MIME-Version: 1.0 Received: by 10.231.79.208 with HTTP; Mon, 12 Apr 2010 04:15:47 -0700 (PDT) In-Reply-To: References: Date: Mon, 12 Apr 2010 07:15:47 -0400 Received: by 10.101.9.7 with SMTP id m7mr3447578ani.218.1271070947904; Mon, 12 Apr 2010 04:15:47 -0700 (PDT) Message-ID: Subject: Re: Current state of (dense) matrix multiplication? From: Vimal Mathew To: mahout-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org The naive matrix-multiplication algorithm is highly parallelizable if you have the data available locally at all the nodes. The persistent storage issue was one of the first problems that I tried solving (HDFS is just wrong for the access patterns in matrix algorithms). I cant compete with Matlab yet! But I am planning to add support for SSE2 instructions, so I might get close. Also I dont have systems with 64G RAM, or 14 cores at one place :( I hope to get much better results in a month or two. On Mon, Apr 12, 2010 at 12:27 AM, Steven Buss wrote= : > If you're just doing matrix multiplication, I would advise that mahout > (or any mapreduce approach) isn't well suited to your problem. I did > the same computation with matlab (multiplying two 40k x 40k random > double precision dense matrices) using 14 cores and about 36GB of ram > on a single machine* and it finished in about 55 minutes. If I'm > reading your email correctly, you were working with 34*2*4=3D272 cores! > I'm not sure if dense matrix multiplication can actually be > efficiently mapreduced, but I am still a rookie so don't take my word > for it. > > *The machine I am working on has 8 dual core AMD opteron 875s @ 2.2GHz > per core, with 64GB total system memory. > > Steven Buss > steven.buss@gmail.com > http://www.stevenbuss.com/ > > > > On Sun, Apr 11, 2010 at 11:53 PM, Ted Dunning wro= te: >> Vimal, >> >> We don't have any distributed dense multiplication operations because we >> have not yet found much application demand for distributed dense matrix >> multiplication. =A0Distributed sparse matrix operations are a big deal, >> however. >> >> If you are interested in working on the problem in the context of Mahout= , we >> would love to help. =A0This is especially true if you have an applicatio= n that >> needs dense operations and could benefit from some of the other capabili= ties >> in Mahout. >> >> On Sun, Apr 11, 2010 at 1:27 PM, Vimal Mathew wro= te: >> >>> Hi, >>> =A0What's the current state of matrix-matrix multiplication in Mahout? >>> Are there any performance results available for large matrices? >>> >>> =A0I have been working on a Hadoop-compatible distributed storage for >>> matrices. I can currently multiply two 40K x 40K dense double >>> precision matrices in around 1 hour using 34 systems (16GB RAM, two >>> Core2Quads' per node). I was wondering how this compares with Mahout. >>> >>> Regards, >>> =A0Vimal >>> >> >