systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-1548) Performance ultra-sparse matrix read
Date Fri, 21 Apr 2017 02:21:04 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthias Boehm updated SYSTEMML-1548:
-------------------------------------
    Description: 
Reading ultra-sparse matrices shows for certain data sizes and memory configurations poor
performance due to garbage collection overheads.

In detail, this task covers two scenarios that will be addressed independently:

1) Large heap: In case of large heaps, the problem are temporarily deserialized sparse blocks
which are not reused due to inefficient resent, leading to lots of garbage and hence high
cost for full garbage collection. This will be addressed by using our CSR sparse blocks for
ultra-sparse blocks because CSR has smaller memory footprint and allows efficient reset.

2) Small heap: In case of small heaps not the temporary blocks but the memory overhead of
the target sparse matrix becomes the bottleneck. This is due to a relatively large memory
overhead per sparse row which is not amortized if a rows has just one or very few non-zeros.
This will be addressed via a modification of the MCSR representation for ultra-sparse matrices.
Note that we cannot use CSR or COO here because we want to support efficient multi-threaded
incremental construction.

  was:Reading ultra-sparse matrices shows for certain data sizes and memory configurations
poor performance due to garbage collection overheads.


> Performance ultra-sparse matrix read
> ------------------------------------
>
>                 Key: SYSTEMML-1548
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1548
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>
> Reading ultra-sparse matrices shows for certain data sizes and memory configurations
poor performance due to garbage collection overheads.
> In detail, this task covers two scenarios that will be addressed independently:
> 1) Large heap: In case of large heaps, the problem are temporarily deserialized sparse
blocks which are not reused due to inefficient resent, leading to lots of garbage and hence
high cost for full garbage collection. This will be addressed by using our CSR sparse blocks
for ultra-sparse blocks because CSR has smaller memory footprint and allows efficient reset.
> 2) Small heap: In case of small heaps not the temporary blocks but the memory overhead
of the target sparse matrix becomes the bottleneck. This is due to a relatively large memory
overhead per sparse row which is not amortized if a rows has just one or very few non-zeros.
This will be addressed via a modification of the MCSR representation for ultra-sparse matrices.
Note that we cannot use CSR or COO here because we want to support efficient multi-threaded
incremental construction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message