systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Boehm <>
Subject Re: SYSTEMML-447
Date Fri, 11 May 2018 04:44:38 GMT
This particular JIRA is only partially related. Niketan and Nakul
worked out the details - the only reason I show up as the reporter is
that, if I remember correctly, we split a larger scoped JIRA for
low-level optimizations (GPU, codegen, compression) into individual
JIRAs and created the detailed tasks.

Overall, I believe that sparse GPU operations would be very valuable,
especially in the context of NLP, graphs, and structured data with
categorical features (which often become very sparse after dummy
coding) because in these ultra-sparse scenarios dense operations cause
unnecessary overheads of orders of magnitude (proportional to the
sparsity). However, creating efficient sparse GPU kernels is
challenging due to irregularities (e.g., sparsity skew). Compared to
CPU operations, there might still be benefit depending on the data
location of inputs/outputs, as well as higher memory bandwidth.

Even in the face of extending the codegen framework for GPUs (which is
still on the roadmap for this year), we would still need dense/sparse
kernels for the individual operations because we want to apply codegen
only if we can benefit from fusion. Right now we call existing
libraries such as cuBLAS and cuDNN and have dense kernels for a subset
of operations such as unary and binary, and unary aggregates.

Regarding ramping up on the GPU backend, maybe it's a good idea to
first start with missing dense operations. I'm thinking of statistical
functions (e.g., covariance, moment), parameterized builtin functions
(e.g., grouped aggregated), missing unary and binary operations (e.g.,
bitwise), missing reorg operations (e.g., reshape, sort - there should
be library for the latter), missing unary, binary and ternary
aggregates, missing nary (e.g., nary cbind/rbind), etc. Adding these
remaining operations would also help a lot. However, if you're more
interested in contributing to the development of sparse kernels, maybe
you could one or two dense operations, get comfortable, and then move
on to sparse operations. Apart from the kernels, a seamless support
for sparse operations would also require some integration work on how
we pass data, maintain nnz, preallocate sparse outputs, etc.


On Thu, May 10, 2018 at 8:47 PM, Janardhan <> wrote:
> Hi Matthias,
> Was this related to long term plan for GPU codegen?
> Thank you,
> Janardhan

View raw message