systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SYSTEMML-1587) Performance ultra-sparse matrix reads
Date Sun, 07 May 2017 03:42:04 GMT
Matthias Boehm created SYSTEMML-1587:
----------------------------------------

             Summary: Performance ultra-sparse matrix reads
                 Key: SYSTEMML-1587
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1587
             Project: SystemML
          Issue Type: Task
            Reporter: Matthias Boehm


We use the MCSR (modified compressed sparse row) format by default for sparse and ultra-sparse
matrices because it allows for efficient incremental construction, including multi-threaded
operations. However, even with SYSTEMML-1548, the MCSR is still too inefficient in its memory
consumption leading to unnecessary garbage collection overhead. 

This task aims to read ultra-sparse matrices (e.g., permutation matrices) into CSR format.
Since CSR does not allow for efficient incremental construction (with multiple unordered input
streams), the approach is to use thread-local COO representations and finally merge them into
a CSR representation. The temporary memory requirements are not problematic because size(CSR)
+ size(COO) < size(MCSR) for ultra sparse matrices and the COO representation can be partitioned
across threads.

Note that this change should be done in a consistent manner for all matrix readers (single-threaded/multi-threaded,
all formats).




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message