systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <>
Subject Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS
Date Wed, 30 Nov 2016 23:08:27 GMT
Could you please make sure you're comparing the right thing. Even on old 
sandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. We 
also did the same experiments with larger matrices and SystemML was 
about 2x faster compared to Breeze. Please decomment the timings in 
LibMatrixMult.matrixMult and double check the timing as well as that 
we're actually comparing dense matrix multiply.


On 11/30/2016 11:54 PM, wrote:
> Hi all,
> I have run a very quick comparison between SystemML's LibMatrixMult and
> Breeze matrix multiplication using native BLAS (OpenBLAS through
> netlib-java). As per my very small comparison I get the result that
> there is a performance difference for dense-dense Matrices of size 1000
> x 1000 (our default blocksize) with Breeze being about 5-6 times faster
> here. The code I used can be found here:
> Running this code with 50 iterations each gives me for example average
> times of:
> Breeze:         49.74 ms
> SystemML:   363.44 ms
> I don't want to say this is true for every operation, but those results
> let us form the hypothesis that native BLAS operations can lead to a
> significant speedup for certain operations which is worth testing with
> more advanced benchmarks.
> Btw: I am definitely not saying we should use Breeze here. I am more
> looking at native BLAS and LAPACK implementations in general (as
> provided by OpenBLAS, MKL, etc.).
> Let me know what you think!
> Felix

View raw message