systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <>
Subject [jira] [Commented] (SYSTEMML-1140) Sparse/Caching performance bugs related to deep learning scripts
Date Sat, 03 Jun 2017 00:12:04 GMT


Matthias Boehm commented on SYSTEMML-1140:

ok, after a closer look, this issue boils down to the serialization overhead of sparse matrices
(in MCSR format) on buffer pool write (i.e., {{MatrixObject.release}} after {{MatrixObject.acquireModify}}).
It can be solved by extending our "shallow serialize" to sparse matrices. Shallow serialize
(which simply keeps a strong reference instead of serializing the matrix) is currently used
only for dense matrices and sparse matrices in CSR format because their in-memory size is
equivalent to their serialized size. For MCSR (our default sparse block), the in-memory representation
has some overhead, so serialization helps to avoid unnecessary evictions to disk. However,
as the number of columns (or nnz per row) grows this overhead becomes negligible. Hence, we
should establish an overhead threshold of say 30% and use a shallow serialize whenever the
overhead is below that threshold. This change should be in master tomorrow.

> Sparse/Caching performance bugs related to deep learning scripts
> ----------------------------------------------------------------
>                 Key: SYSTEMML-1140
>                 URL:
>             Project: SystemML
>          Issue Type: Bug
>    Affects Versions: SystemML 1.0
>            Reporter: Niketan Pansare
>            Priority: Blocker
> We have identified two performance bugs that frequently occurs in deep learning script.
> First, we repeatedly perform unnecessary conversion to sparse format. Also, the operations
such as matrix multiplication (including BLAS and CuBLAS) are  optimized for dense.
> Second, even with large memory budget, we sometimes spend almost 20-30% time in caching.
> [~mboehm7] [~reinwald] [] I am labeling this bug as blocker for SystemML
1.0. Please feel free to assign this issue to yourself.

This message was sent by Atlassian JIRA

View raw message