flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton Mushin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5782) Support GPU calculations
Date Mon, 02 Oct 2017 10:52:01 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187863#comment-16187863
] 

Anton Mushin commented on FLINK-5782:
-------------------------------------

Hi everyone
bq. 2)	Currently Flink uses Breeze, to optimize linear algebra calculations, ND4J can’t
be integrated as is, because it still doesn’t support sparse arrays. Maybe this issue should
be simply contributed to ND4J to enable its usage?
Now Nd4j support sparse array: https://github.com/deeplearning4j/nd4j/pull/1711
But this functional is not release: https://github.com/deeplearning4j/nd4j/issues/66#issuecomment-332809785


> Support GPU calculations
> ------------------------
>
>                 Key: FLINK-5782
>                 URL: https://issues.apache.org/jira/browse/FLINK-5782
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.3.0
>            Reporter: Kate Eri
>            Assignee: Kate Eri
>            Priority: Minor
>
> This ticket was initiated as continuation of the dev discussion thread: [New Flink team
member - Kate Eri (Integration with DL4J topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser]
 
> Recently we have proposed the idea to integrate [Deeplearning4J|https://deeplearning4j.org/index.html]
with Apache Flink. 
> It is known that DL models training is resource demanding process, so training on CPU
could converge much longer than on GPU.  
> But not only for DL training GPU usage could be supposed, but also for optimization of
graph analytics and other typical data manipulations, nice overview of GPU related problems
is presented [Accelerating Spark workloads using GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus].
> Currently the community pointed the following issues to consider:
> 1)	Flink would like to avoid to write one more time its own GPU support, to reduce engineering
burden. That’s why such libraries like [ND4J|http://nd4j.org/userguide]  should be considered.

> 2)	Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to optimize linear
algebra calculations, ND4J can’t be integrated as is, because it still doesn’t support
[sparse arrays|http://nd4j.org/userguide#faq]. Maybe this issue should be simply contributed
to ND4J to enable its usage?
> 3)	The calculations would have to work with both available and not available GPUs. If
the system detects that GPUs are available, then ideally it would exploit them. Thus GPU resource
management could be incorporated in [FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131]
(only suggested).
> 4)	It was mentioned that as far Flink takes care of shipping data around the cluster,
also it will perform its dump out to GPU for calculation and load back up. In practice, the
lack of a persist method for intermediate results makes this troublesome (not because of GPUs
but for calculating any sort of complex algorithm we expect to be able to cache intermediate
results).
> That’s why the Ticket [FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730]
must be implemented to solve such problem.  
> 5)	Also it was recommended to take a look at Apache Mahout, at least to get the experience
with  GPU integration and check its
> https://github.com/apache/mahout/tree/master/viennacl-omp
> https://github.com/apache/mahout/tree/master/viennacl 
> 6)  For now, GPU proposed only for batch calculations optimization, to support GPU for
streaming should be started another ticket, because optimization of streaming by GPU requires
additional research. 	
> 7) Also experience of Netflix regarding this question could be considered: [Distributed
Neural Networks with GPUs in the AWS Cloud|http://techblog.netflix.com/search/label/CUDA]
  
> This is considered as master ticket for GPU related ticktes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message