systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <>
Subject [jira] [Created] (SYSTEMML-1837) Unary aggregate w/ corrections output to large physical blocks
Date Fri, 11 Aug 2017 19:26:00 GMT
Matthias Boehm created SYSTEMML-1837:

             Summary: Unary aggregate w/ corrections output to large physical blocks
                 Key: SYSTEMML-1837
             Project: SystemML
          Issue Type: Bug
            Reporter: Matthias Boehm

Many unary aggregate operations store corrections in additional columns or rows. For example,
{{rowSums(X)}} uses a two-column output to store sums and corrections. In CP, we drop these
corrections immediately after the operations, while in MR and Spark these corrections are
dropped after final aggregation. The issue is that the {{MatrixBlock::dropLastRowsOrColums}}
does not actually drop the correction but simply shifts all values in the right starting positions.
Hence, the physical output is actually larger than what the memory estimates represent. This
leads to unnecessary large memory consumption during subsequent operations and in the buffer
pool, which can lead to OOMs. This task aims to fix {{MatrixBlock::dropLastRowsOrColums}}.

In a subsequent task, we could also modify all unary aggregates to never allocate the multi-column/row
output when executed in CP. However, this requires custom code paths for the different backends.

This message was sent by Atlassian JIRA

View raw message