commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles <gil...@harfang.homelinux.org>
Subject Re: [math] matrix multiplication surprisingly slow?
Date Thu, 07 Apr 2016 14:38:57 GMT
On Wed, 6 Apr 2016 12:55:06 +0100, Chris Lucas wrote:
> Thanks for the quick reply!
>
> I've pasted a small self-contained example at the bottom. It creates 
> the
> matrices in advance, but nothing meaningful changes if they're 
> created on a
> per-operation basis.
>
> Results for 50 multiplications of [size]x[size] matrices:
>    Size: 10, total time: 0.012 seconds, time per: 0.000 seconds
>    Size: 100, total time: 0.062 seconds, time per: 0.001 seconds
>    Size: 300, total time: 3.050 seconds, time per: 0.061 seconds
>    Size: 500, total time: 15.186 seconds, time per: 0.304 seconds
>    Size: 600, total time: 17.532 seconds, time per: 0.351 seconds
>
> For comparison:
>
> Results for 50 additions of [size]x[size] matrices (which should be 
> faster,
> be the extent of the difference is nonetheless striking to me):
>    Size: 10, total time: 0.011 seconds, time per: 0.000 seconds
>    Size: 100, total time: 0.012 seconds, time per: 0.000 seconds
>    Size: 300, total time: 0.020 seconds, time per: 0.000 seconds
>    Size: 500, total time: 0.032 seconds, time per: 0.001 seconds
>    Size: 600, total time: 0.050 seconds, time per: 0.001 seconds
>
> Results for 50 inversions of a [size]x[size] matrix, which I'd expect 
> to be
> slower than multiplication for larger matrices:
>    Size: 10, total time: 0.014 seconds, time per: 0.000 seconds
>    Size: 100, total time: 0.074 seconds, time per: 0.001 seconds
>    Size: 300, total time: 1.005 seconds, time per: 0.020 seconds
>    Size: 500, total time: 5.024 seconds, time per: 0.100 seconds
>    Size: 600, total time: 9.297 seconds, time per: 0.186 seconds
>
> I hope this is useful, and if I'm doing something wrong that's 
> leading to
> this performance gap, I'd love to know.

Here is the result of a JMH run (only benchmarking matrix 
multiplication):
---CUT---
[...]

Benchmark                           Mode  Cnt        Score      Error  
Units
MyBenchmarkMatrix.size100x100_a2d  thrpt  200      738.787 ±    8.501  
ops/s
MyBenchmarkMatrix.size100x100_b    thrpt  200     1527.522 ±   16.158  
ops/s
MyBenchmarkMatrix.size10x10_a2d    thrpt  200   697662.468 ± 4807.916  
ops/s
MyBenchmarkMatrix.size10x10_b      thrpt  200  1003800.610 ± 6612.299  
ops/s
MyBenchmarkMatrix.size600x600_a2d  thrpt  200        0.669 ±    0.007  
ops/s
MyBenchmarkMatrix.size600x600_b    thrpt  200        7.794 ±    0.071  
ops/s
---CUT---

where the suffixindicates the layout type:
   a2d -> Array2DRowRealMatrix
     b -> BlockRealMatrix
[So, even for small matrices, "BlockRealMatrix" is a clear win.]

As for the multiplication getting slower for larger sizes, I'd assume 
that it's
related to the O(n^2) nature of the algorithm (where "n" is the number 
of elements
of the matrix).
For the two layouts provided in CM and for the other library, you could 
perhaps
confirm the exact dependency by fitting the parameters "a", "b", "c" of 
the
function
   t(n) = a + b n + c n^2
using e.g. 
http://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/org/apache/commons/math3/fitting/PolynomialCurveFitter.html
(and more data points).


Regards,
Gilles

>
> ----
> import org.apache.commons.math3.linear.LUDecomposition;
> import org.apache.commons.math3.linear.Array2DRowRealMatrix;
> import org.apache.commons.math3.linear.RealMatrix;
>
>
> public class AMCMatrices {
>
>   public static void main(String[] args) {
>     miniTest(0);
>   }
>
>   public static void miniTest(int tType) {
>     int samples = 50;
>
>     int sizes[] = { 10, 100, 300, 500, 600 };
>
>     for (int sI = 0; sI < sizes.length; sI++) {
>       int mSize = sizes[sI];
>
>       org.apache.commons.math3.linear.RealMatrix m0 = buildM(mSize, 
> mSize);
>       RealMatrix m1 = buildM(mSize, mSize);
>
>       long start = System.nanoTime();
>       for (int n = 0; n < samples; n++) {
>         switch (tType) {
>         case 0:
>           m0.multiply(m1);
>           break;
>         case 1:
>           m0.add(m1);
>           break;
>         case 2:
>           new LUDecomposition(m0).getSolver().getInverse();
>           break;
>         }
>
>       }
>       long end = System.nanoTime();
>
>       double dt = ((double) (end - start)) / 1E9;
>       System.out.println(String.format(
>           "Size: %d, total time: %3.3f seconds, time per: %3.3f 
> seconds",
>           mSize, dt, dt / samples));
>     }
>   }
>
>   public static Array2DRowRealMatrix buildM(int r, int c) {
>     double[][] matVals = new double[r][c];
>     for (int i = 0; i < r; i++) {
>       for (int j = 0; j < c; j++) {
>         matVals[i][j] = Math.random();
>       }
>     }
>     return new Array2DRowRealMatrix(matVals);
>   }
> }
>
> ----
>
> On 5 April 2016 at 19:36, Gilles <gilles@harfang.homelinux.org> 
> wrote:
>
>> Hi.
>>
>> On Tue, 5 Apr 2016 15:43:04 +0100, Chris Lucas wrote:
>>
>>> I recently ran a benchmark comparing the performance math commons 
>>> 3.6.1's
>>> linear algebra library to the that of scala Breeze (
>>> https://github.com/scalanlp/breeze).
>>>
>>> I looked at det, inverse, Cholesky factorization, addition, and
>>> multiplication, including matrices with 10, 100, 500, and 1000 
>>> elements,
>>> with symmetric, non-symmetric, and non-square cases where 
>>> applicable.
>>>
>>
>> It would be interesting to add this to the CM documentation:
>>   https://issues.apache.org/jira/browse/MATH-1194
>>
>> In general, I was pleasantly surprised: math commons performed about 
>> as
>>> well as Breeze, despite the latter relying on native libraries. 
>>> There was
>>> one exception, however:
>>>
>>>     m0.multiply(m1)
>>>
>>> where m0 and m1 are both Array2DRowRealMatrix instances. It scaled 
>>> very
>>> poorly in math commons, being much slower than nominally more 
>>> expensive
>>> operations like inv and the Breeze implementation. Does anyone have 
>>> a
>>> thought as to what's going on?
>>>
>>
>> Could your provide more information such as a plot of performance vs 
>> size?
>> A self-contained and minimal code to run would be nice too.
>> See the CM micro-benchmarking tool here:
>>
>> 
>> https://github.com/apache/commons-math/blob/master/src/test/java/org/apache/commons/math4/PerfTestUtils.java
>> And an example of how to use it:
>>
>> 
>> https://github.com/apache/commons-math/blob/master/src/userguide/java/org/apache/commons/math4/userguide/FastMathTestPerformance.java
>>
>> In case it's useful, one representative test
>>> involves multiplying two instances of
>>>
>>>     new Array2DRowRealMatrix(matVals)
>>>
>>> where matVals is 1000x1000 entries of math.random and the second 
>>> instance
>>> is created as part of the loop. This part of the benchmark is not 
>>> specific
>>> to the expensive multiplication step, and takes very little time 
>>> relative
>>> to the multiplication itself. I'm using System.nanotime for the 
>>> timing,
>>> and
>>> take the average time over several consecutive iterations, on a 3.5 
>>> GHz
>>> Intel Core i7, Oracle JRE (build 1.8.0_05-b13).
>>>
>>
>> You might want to confirm the behaviour using JMH (becoming the Java
>> standard
>> for benchmarking):
>>   http://openjdk.java.net/projects/code-tools/jmh/
>>
>>
>> Best regards,
>> Gilles
>>
>>
>>> Thanks,
>>>
>>> Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message