ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Ignatenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-5535) BLAS support for offheap vector/matrix
Date Wed, 11 Oct 2017 12:58:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200226#comment-16200226

Oleg Ignatenko commented on IGNITE-5535:

I implemented trial code for off-heap BLAS on linux-x86_64 in [branch ignite-5535-1|https://github.com/gridgain/apache-ignite/tree/ignite-5535-1]
(also attached as a patch here: [^IGNITE-5535.BLAS_support_for_offheap_vector_matrix.zip]).

This code was benchmarked against on-heap BLAS working with netlib. Benchmarks show that performance
is about the same. Based on that unexpected outcome I took a closer look at netlib and discovered
that its implementation uses JNI in a way that doesn't require copying data from Java heap
(method [GetPrimitiveArrayCritical|https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical]).

This means that our implementation won't speedup on-heap BLAS handled by netlib. This in turn
limits expected benefit solely to off-heap data processing and more specifically, to cases
when copying data to heap (for its further processing with netlib) would be not feasible for
some reasons.


In order to account for the change in our expectations (which initially assumed unconditional
improvement) I closer inspected expected implementation efforts:\\

- Coding. I expect we need to write 10-15K code in addition to what is already in trial implementation
- mostly build scripts plus some amount of scaffolding in Java (and maybe a small bit of C).
This estimate is primarily based on what I have seen in (properly designed) netlib project.

- Build. Trial implementation modified ml module build in an offensively straightforward way
which is hardly appropriate as a proper part of the project. In particular, packaging is set
to "so" while ml.jar builds only as a secondary target. Also, trial implementation build won't
work on Windows. And even on Linux build introduces some obscure dependencies on stuff from
GCC tool chain that needs to be installed in order for it to work.

- Design. Trial implementation makes many shortcuts that need to be addressed to keep code
maintainable. To start with, the way how off-heap data is exposed is rather blunt, pointer
is plainly passed all the way up with dumb getters from respective storage into Vector and
Matrix implementations. Even in case if structurally this turns out the right way (which I
highly doubt), naming of the methods just doesn't feel right ("ptr").\\
Another important thing is, trial implementation doesn't take care of platforms where we decide
not to implement this feature (what would be the fallback in these cases), nor does it take
care of cases when supported platform doesn't have cblas library available (side note in these
cases it would probably make sense to somehow reuse netlib's cblas "fallback" since we have
it anyway). This was okay for a trial implementation but looks totally unacceptable in a proper
part of the project.\\
The last but not the least, trial implementation is designed to explicitly work with concrete
off-heap implementations of Vector and Matrix instead of respective interfaces. A thought
should be given if this is OK to provide public API like that (and why that would be okay?)
and if it's not then how to redesign it.

- Testing. This feature is platform and library dependent which means it should be tested
on all platforms we will decide to support plus at least on one platform that we decide to
ignore. Also, at supported platforms testing has to be done twice, first with cblas library
available and second, when it is not there.


Summing up above, as of now implementing off-heap BLAS does not look worth the effort.

I have no reasons to expect that usage when it would be beneficial (that is, processing off-heap
data such that copying it on-heap is not feasible) is important enough to justify spending
efforts described above.

This decision may be reconsidered later when we gain more understanding about expected usage
of ML Grid.

> BLAS support for offheap vector/matrix
> --------------------------------------
>                 Key: IGNITE-5535
>                 URL: https://issues.apache.org/jira/browse/IGNITE-5535
>             Project: Ignite
>          Issue Type: Task
>          Components: ml
>            Reporter: Yury Babak
>            Assignee: Oleg Ignatenko
>         Attachments: IGNITE-5535.BLAS_support_for_offheap_vector_matrix.zip
> We want to add BLAS support for offheap stuctures. Current we implement only onheap version.

This message was sent by Atlassian JIRA

View raw message