systemml-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias Boehm (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SYSTEMML-1752) Cache-conscious mmchain matrix multiply for wide matrices
Date Sat, 08 Jul 2017 08:03:00 GMT

     [ https://issues.apache.org/jira/browse/SYSTEMML-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthias Boehm updated SYSTEMML-1752:
-------------------------------------
    Description: 
The fused mmchain matrix multiply for patterns such as {{t(X) %*% (w * (X %*% v)) }} uses
row-wise {{dotProduct}} and {{vectMultAdd}} operations, which works very well for the common
case of tall&skinny matrices where individual rows fit into L1 cache. However, for graph
and text scenarios with wide matrices this leads to cache trashing on the input and output
vectors.

This task aims to generalize these dense and sparse operations to perform the computation
in a cache-conscious manner when necessary, by accessing fragments of the input and output
vector for groups of rows. For dense this is trivial to realize while for sparse it requires
a careful determination of the block sizes according to the input sparsity. 
     Issue Type: Task  (was: Bug)

> Cache-conscious mmchain matrix multiply for wide matrices
> ---------------------------------------------------------
>
>                 Key: SYSTEMML-1752
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1752
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>
> The fused mmchain matrix multiply for patterns such as {{t(X) %*% (w * (X %*% v)) }}
uses row-wise {{dotProduct}} and {{vectMultAdd}} operations, which works very well for the
common case of tall&skinny matrices where individual rows fit into L1 cache. However,
for graph and text scenarios with wide matrices this leads to cache trashing on the input
and output vectors.
> This task aims to generalize these dense and sparse operations to perform the computation
in a cache-conscious manner when necessary, by accessing fragments of the input and output
vector for groups of rows. For dense this is trivial to realize while for sparse it requires
a careful determination of the block sizes according to the input sparsity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message