Return-Path: X-Original-To: apmail-commons-user-archive@www.apache.org Delivered-To: apmail-commons-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C11D519582 for ; Thu, 7 Apr 2016 15:50:44 +0000 (UTC) Received: (qmail 65013 invoked by uid 500); 7 Apr 2016 15:50:43 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 64900 invoked by uid 500); 7 Apr 2016 15:50:43 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 64886 invoked by uid 99); 7 Apr 2016 15:50:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Apr 2016 15:50:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id B2DF1C62E0 for ; Thu, 7 Apr 2016 15:50:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.72 X-Spam-Level: X-Spam-Status: No, score=-0.72 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=scarlet.be Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Mxsy5pTpMg5M for ; Thu, 7 Apr 2016 15:50:40 +0000 (UTC) Received: from sif.is.scarlet.be (sif.is.scarlet.be [193.74.71.28]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id CB2FF5FB3F for ; Thu, 7 Apr 2016 15:50:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=scarlet.be; s=scarlet; t=1460044238; bh=hmK3BXGkIu0dF7Q6hPkYMrQZHThXRej2OEocQr35LUE=; h=MIME-Version:Content-Type:Content-Transfer-Encoding:Date:From:To: Subject:In-Reply-To:References:Message-ID; b=SWOinj0bVZA6hrDBKhSNU1uWAGKHuNvcyZ8FqO2RsiWA70NQmwYT7A1oQBcgUPrx5 HgMsuoDwLD+KmXIADBamzNsma3j2ejaUxIKORI0jQL3jqjmmTzrMZhxZQ0mtwuyVEZ FBv3i2+htV1pBjKGWFlntP8AiOeiV3LV9azLxALE= Received: from webmail.scarlet.be (meigs.is.scarlet.be [193.74.71.216]) by sif.is.scarlet.be (8.14.9/8.14.9) with ESMTP id u37FobuM013290 for ; Thu, 7 Apr 2016 17:50:38 +0200 X-Scarlet: d=1460044238 c=193.74.71.216 Received: from pno-astro-26.ulb.ac.be ([164.15.138.26]) via astropc12.ulb.ac.be ([164.15.138.26]) by webmail.scarlet.be with HTTP (HTTP/1.1 POST); Thu, 07 Apr 2016 17:50:37 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Thu, 07 Apr 2016 17:50:37 +0200 From: Gilles To: Subject: Re: [math] matrix multiplication surprisingly =?UTF-8?Q?slow=3F?= In-Reply-To: References: Message-ID: <638c67a2373bad3084f9b7eba35034fa@scarlet.be> X-Sender: gilles@harfang.homelinux.org User-Agent: Scarlet Webmail X-DCC-scarlet.be-Metrics: sif; whitelist X-Virus-Scanned: clamav-milter 0.98.1-exp at sif X-Virus-Status: Clean On Thu, 7 Apr 2016 16:17:52 +0100, Chris Lucas wrote: > Thanks for suggesting the BlockRealMatrix. I've run a few benchmarks > comparing the two, along with some other libraries. The email is not really suited for tables (see your message below). What are the points which you want to make? As said before, "reports" on CM features (a.o. benchmarks) are welcome but should be formatted for inclusion in the userguide (for now, until we might create a separate document for data that is subject to change more or less rapidly). If you suspect a problem with the code, then I suggest that you file a JIRA report, where files can be attached (or tables can be viewed without being deformed thanks to the "{noformat}" tag). Regards, Gilles > I'm using jmh 1.11.3, JDK 1.8.0_05, VM 25.5-b02, with 2 warmups and > 10 test > iterations. > > The suffixes denote what's being tested: > MC: AMC 3.6.1 using Array2DRowRealMatrix > SB: Scala Breeze 0.12 with native libraries > OJ: OJAlgo 39.0. Some other benchmarks suggest OJAlgo is an > especially > speedy pure-java library. > BMC: MC using a BlockRealMatrix. > > I'm using the same matrix creation/multiplication/inverse code as I > mentioned in my previous note. When testing BlockReadMatrices, I'm > using > fixed random matrices and annotating my class with > @State(Scope.Benchmark). > For the curious, my rationale for building matrices on the spot is > that I'm > already caching results pretty heavily and rarely perform the same > operation on the same matrix twice -- I'm most curious about > performance in > the absence of caching benefits. > > Test 1a: Creating and multiplying two random/uniform (i.e., all > entries > drawn from math.random()) 100x100 matrices: > [info] MatrixBenchmarks.buildMultTestMC100 thrpt 10 > 836.579 > ± 7.120 ops/s > [info] MatrixBenchmarks.buildMultTestSB100 thrpt 10 > 1649.089 > ± 170.649 ops/s > [info] MatrixBenchmarks.buildMultTestOJ100 thrpt 10 1485.014 ± > 44.158 > ops/s > [info] MatrixBenchmarks.multBMC100 thrpt 10 1051.055 ± 2.290 > ops/s > > Test 1b: Creating and multiplying two random/uniform 500x500 > matrices: > [info] MatrixBenchmarks.buildMultTestMC500 thrpt 10 > 8.997 > ± 0.064 ops/s > [info] MatrixBenchmarks.buildMultTestSB500 thrpt 10 > 80.558 > ± 0.627 ops/s > [info] MatrixBenchmarks.buildMultTestOJ500 thrpt 10 > 34.626 > ± 2.505 ops/s > [info] MatrixBenchmarks.multBMC500 thrpt 10 9.132 ± 0.059 > ops/s > [info] MatrixBenchmarks.multCacheBMC500 thrpt 10 13.630 ± 0.107 > ops/s > [no matrix creation] > > --- > > Test 2a: Creating a single 500x500 matrix (to get a sense of whether > the > mult itself is driving differences: > [info] MatrixBenchmarks.buildMC500 thrpt 10 > 155.026 > ± 2.456 ops/s > [info] MatrixBenchmarks.buildSB500 thrpt 10 > 197.257 > ± 4.619 ops/s > [info] MatrixBenchmarks.buildOJ500 thrpt 10 > 176.036 > ± 2.121 ops/s > > Test 2b: Creating a single 1000x1000 matrix (to show it scales w/N): > [info] MatrixBenchmarks.buildMC1000 thrpt 10 36.112 ± 2.775 > ops/s > [info] MatrixBenchmarks.buildSB1000 thrpt 10 45.557 ± 2.893 > ops/s > [info] MatrixBenchmarks.buildOJ1000 thrpt 10 47.438 ± 1.370 > ops/s > [info] MatrixBenchmarks.buildBMC1000 thrpt 10 37.779 ± 0.871 > ops/s > > --- > > Test 3a: Inverting a random 100x100 matrix: > [info] MatrixBenchmarks.invMC100 thrpt 10 1033.011 ± 9.928 > ops/s > [info] MatrixBenchmarks.invSB100 thrpt 10 1664.833 ± 15.170 > ops/s > [info] MatrixBenchmarks.invOJ100 thrpt 10 1174.044 ± 52.083 > ops/s > [info] MatrixBenchmarks.invBMC100 thrpt 10 1043.858 ± 22.144 > ops/s > > Test 3b: Inverting a random 500x500 matrix: > [info] MatrixBenchmarks.invMC500 thrpt 10 9.089 ± > 0.284 ops/s > [info] MatrixBenchmarks.invSB500 thrpt 10 43.857 ± > 1.083 ops/s > [info] MatrixBenchmarks.invOJ500 thrpt 10 23.444 ± > 1.484 ops/s > [info] MatrixBenchmarks.invBMC500 thrpt 10 9.156 ± 0.052 > ops/s > [info] MatrixBenchmarks.invCacheBMC500 thrpt 10 9.627 ± 0.084 > ops/s > [no matrix creation] > > Test 3c: > [info] MatrixBenchmarks.invMC1000 thrpt 10 0.704 ± 0.040 > ops/s > [info] MatrixBenchmarks.invSB1000 thrpt 10 8.611 ± > 0.557 > ops/s > [info] MatrixBenchmarks.invOJ1000 thrpt 10 3.539 ± 0.229 > ops/s > [info] MatrixBenchmarks.invBMC1000 thrpt 10 0.691 ± 0.095 > ops/s > > Also, isn't matrix multiplication at least O(N^2.37), rather than > O(N^2)? > > On 6 April 2016 at 12:55, Chris Lucas wrote: > >> Thanks for the quick reply! >> >> I've pasted a small self-contained example at the bottom. It creates >> the >> matrices in advance, but nothing meaningful changes if they're >> created on a >> per-operation basis. >> >> Results for 50 multiplications of [size]x[size] matrices: >> Size: 10, total time: 0.012 seconds, time per: 0.000 seconds >> Size: 100, total time: 0.062 seconds, time per: 0.001 seconds >> Size: 300, total time: 3.050 seconds, time per: 0.061 seconds >> Size: 500, total time: 15.186 seconds, time per: 0.304 seconds >> Size: 600, total time: 17.532 seconds, time per: 0.351 seconds >> >> For comparison: >> >> Results for 50 additions of [size]x[size] matrices (which should be >> faster, be the extent of the difference is nonetheless striking to >> me): >> Size: 10, total time: 0.011 seconds, time per: 0.000 seconds >> Size: 100, total time: 0.012 seconds, time per: 0.000 seconds >> Size: 300, total time: 0.020 seconds, time per: 0.000 seconds >> Size: 500, total time: 0.032 seconds, time per: 0.001 seconds >> Size: 600, total time: 0.050 seconds, time per: 0.001 seconds >> >> Results for 50 inversions of a [size]x[size] matrix, which I'd >> expect to >> be slower than multiplication for larger matrices: >> Size: 10, total time: 0.014 seconds, time per: 0.000 seconds >> Size: 100, total time: 0.074 seconds, time per: 0.001 seconds >> Size: 300, total time: 1.005 seconds, time per: 0.020 seconds >> Size: 500, total time: 5.024 seconds, time per: 0.100 seconds >> Size: 600, total time: 9.297 seconds, time per: 0.186 seconds >> >> I hope this is useful, and if I'm doing something wrong that's >> leading to >> this performance gap, I'd love to know. >> >> ---- >> import org.apache.commons.math3.linear.LUDecomposition; >> import org.apache.commons.math3.linear.Array2DRowRealMatrix; >> import org.apache.commons.math3.linear.RealMatrix; >> >> >> public class AMCMatrices { >> >> public static void main(String[] args) { >> miniTest(0); >> } >> >> public static void miniTest(int tType) { >> int samples = 50; >> >> int sizes[] = { 10, 100, 300, 500, 600 }; >> >> for (int sI = 0; sI < sizes.length; sI++) { >> int mSize = sizes[sI]; >> >> org.apache.commons.math3.linear.RealMatrix m0 = buildM(mSize, >> mSize); >> RealMatrix m1 = buildM(mSize, mSize); >> >> long start = System.nanoTime(); >> for (int n = 0; n < samples; n++) { >> switch (tType) { >> case 0: >> m0.multiply(m1); >> break; >> case 1: >> m0.add(m1); >> break; >> case 2: >> new LUDecomposition(m0).getSolver().getInverse(); >> break; >> } >> >> } >> long end = System.nanoTime(); >> >> double dt = ((double) (end - start)) / 1E9; >> System.out.println(String.format( >> "Size: %d, total time: %3.3f seconds, time per: %3.3f >> seconds", >> mSize, dt, dt / samples)); >> } >> } >> >> public static Array2DRowRealMatrix buildM(int r, int c) { >> double[][] matVals = new double[r][c]; >> for (int i = 0; i < r; i++) { >> for (int j = 0; j < c; j++) { >> matVals[i][j] = Math.random(); >> } >> } >> return new Array2DRowRealMatrix(matVals); >> } >> } >> >> ---- >> >> On 5 April 2016 at 19:36, Gilles >> wrote: >> >>> Hi. >>> >>> On Tue, 5 Apr 2016 15:43:04 +0100, Chris Lucas wrote: >>> >>>> I recently ran a benchmark comparing the performance math commons >>>> 3.6.1's >>>> linear algebra library to the that of scala Breeze ( >>>> https://github.com/scalanlp/breeze). >>>> >>>> I looked at det, inverse, Cholesky factorization, addition, and >>>> multiplication, including matrices with 10, 100, 500, and 1000 >>>> elements, >>>> with symmetric, non-symmetric, and non-square cases where >>>> applicable. >>>> >>> >>> It would be interesting to add this to the CM documentation: >>> https://issues.apache.org/jira/browse/MATH-1194 >>> >>> In general, I was pleasantly surprised: math commons performed >>> about as >>>> well as Breeze, despite the latter relying on native libraries. >>>> There was >>>> one exception, however: >>>> >>>> m0.multiply(m1) >>>> >>>> where m0 and m1 are both Array2DRowRealMatrix instances. It scaled >>>> very >>>> poorly in math commons, being much slower than nominally more >>>> expensive >>>> operations like inv and the Breeze implementation. Does anyone >>>> have a >>>> thought as to what's going on? >>>> >>> >>> Could your provide more information such as a plot of performance >>> vs size? >>> A self-contained and minimal code to run would be nice too. >>> See the CM micro-benchmarking tool here: >>> >>> >>> https://github.com/apache/commons-math/blob/master/src/test/java/org/apache/commons/math4/PerfTestUtils.java >>> And an example of how to use it: >>> >>> >>> https://github.com/apache/commons-math/blob/master/src/userguide/java/org/apache/commons/math4/userguide/FastMathTestPerformance.java >>> >>> In case it's useful, one representative test >>>> involves multiplying two instances of >>>> >>>> new Array2DRowRealMatrix(matVals) >>>> >>>> where matVals is 1000x1000 entries of math.random and the second >>>> instance >>>> is created as part of the loop. This part of the benchmark is not >>>> specific >>>> to the expensive multiplication step, and takes very little time >>>> relative >>>> to the multiplication itself. I'm using System.nanotime for the >>>> timing, >>>> and >>>> take the average time over several consecutive iterations, on a >>>> 3.5 GHz >>>> Intel Core i7, Oracle JRE (build 1.8.0_05-b13). >>>> >>> >>> You might want to confirm the behaviour using JMH (becoming the >>> Java >>> standard >>> for benchmarking): >>> http://openjdk.java.net/projects/code-tools/jmh/ >>> >>> >>> Best regards, >>> Gilles >>> >>> >>>> Thanks, >>>> >>>> Chris >>>> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org