Mailing-List: contact issues-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
Date: Fri, 17 Feb 2017 12:55:41 +0000 (UTC)
From: "Kate Eri (JIRA)" <jira@apache.org>
To: issues@flink.apache.org
Message-ID: <JIRA.13042120.1486745666000.109811.1487336141736@Atlassian.JIRA>
In-Reply-To: <JIRA.13042120.1486745666000@Atlassian.JIRA>
References: <JIRA.13042120.1486745666000@Atlassian.JIRA> <JIRA.13042120.1486745666602@jira-lw-us.apache.org>
Subject: [jira] [Updated] (FLINK-5782) Support GPU calculations
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Fri, 17 Feb 2017 12:55:50 -0000


     [ https://issues.apache.org/jira/browse/FLINK-5782?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kate Eri updated FLINK-5782:
----------------------------
    Description:=20
This ticket was initiated as continuation of the dev discussion thread: [Ne=
w Flink team member - Kate Eri (Integration with DL4J topic)|http://mail-ar=
chives.apache.org/mod_mbox/flink-dev/201702.mbox/browser] =20
Recently we have proposed the idea to integrate [Deeplearning4J|https://dee=
plearning4j.org/index.html] with Apache Flink.=20
It is known that DL models training is resource demanding process, so train=
ing on CPU could converge much longer than on GPU. =20

But not only for DL training GPU usage could be supposed, but also for opti=
mization of graph analytics and other typical data manipulations, nice over=
view of GPU related problems is presented [Accelerating Spark workloads usi=
ng GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using=
-gpus].

Currently the community pointed the following issues to consider:
1)=09Flink would like to avoid to write one more time its own GPU support, =
to reduce engineering burden. That=E2=80=99s why such libraries like [ND4J|=
http://nd4j.org/userguide]  should be considered.=20
2)=09Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to o=
ptimize linear algebra calculations, ND4J can=E2=80=99t be integrated as is=
, because it still doesn=E2=80=99t support [sparse arrays|http://nd4j.org/u=
serguide#faq]. Maybe this issue should be simply contributed to ND4J to ena=
ble its usage?
3)=09The calculations would have to work with both available and not availa=
ble GPUs. If the system detects that GPUs are available, then ideally it wo=
uld exploit them. Thus GPU resource management could be incorporated in [FL=
INK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggested)=
.
4)=09It was mentioned that as far Flink takes care of shipping data around =
the cluster, also it will perform its dump out to GPU for calculation and l=
oad back up. In practice, the lack of a persist method for intermediate res=
ults makes this troublesome (not because of GPUs but for calculating any so=
rt of complex algorithm we expect to be able to cache intermediate results)=
.
That=E2=80=99s why the Ticket [FLINK-1730|https://issues.apache.org/jira/br=
owse/FLINK-1730] must be implemented to solve such problem. =20
5)=09Also it was recommended to take a look at Apache Mahout, at least to g=
et the experience with  GPU integration and check its
https://github.com/apache/mahout/tree/master/viennacl-omp
https://github.com/apache/mahout/tree/master/viennacl=20

6)  For now, GPU proposed only for batch calculations optimization, to supp=
ort GPU for streaming should be started another ticket, because optimizatio=
n of streaming by GPU requires additional research. =09
7) Also experience of Netflix regarding this question could be considered: =
[Distributed Neural Networks with GPUs in the AWS Cloud|http://techblog.net=
flix.com/search/label/CUDA]  =20

This is considered as master ticket for GPU related ticktes


  was:
This ticket was initiated as continuation of the dev discussion thread: [Ne=
w Flink team member - Kate Eri (Integration with DL4J topic)|http://mail-ar=
chives.apache.org/mod_mbox/flink-dev/201702.mbox/browser] =20
Recently we have proposed the idea to integrate [Deeplearning4J|https://dee=
plearning4j.org/index.html] with Apache Flink.=20
It is known that DL models training is resource demanding process, so train=
ing on CPU could converge much longer than on GPU. =20

But not only for DL training GPU usage could be supposed, but also for opti=
mization of graph analytics and other typical data manipulations, nice over=
view of GPU related problems is presented [Accelerating Spark workloads usi=
ng GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using=
-gpus].

Currently the community pointed the following issues to consider:
1)=09Flink would like to avoid to write one more time its own GPU support, =
to reduce engineering burden. That=E2=80=99s why such libraries like [ND4J|=
http://nd4j.org/userguide]  should be considered.=20
2)=09Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to o=
ptimize linear algebra calculations, ND4J can=E2=80=99t be integrated as is=
, because it still doesn=E2=80=99t support [sparse arrays|http://nd4j.org/u=
serguide#faq]. Maybe this issue should be simply contributed to ND4J to ena=
ble its usage?
3)=09The calculations would have to work with both available and not availa=
ble GPUs. If the system detects that GPUs are available, then ideally it wo=
uld exploit them. Thus GPU resource management could be incorporated in [FL=
INK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggested)=
.
4)=09It was mentioned that as far Flink takes care of shipping data around =
the cluster, also it will perform its dump out to GPU for calculation and l=
oad back up. In practice, the lack of a persist method for intermediate res=
ults makes this troublesome (not because of GPUs but for calculating any so=
rt of complex algorithm we expect to be able to cache intermediate results)=
.
That=E2=80=99s why the Ticket [FLINK-1730|https://issues.apache.org/jira/br=
owse/FLINK-1730] must be implemented to solve such problem. =20
5)=09Also it was recommended to take a look at Apache Mahout, at least to g=
et the experience with  GPU integration and check its
https://github.com/apache/mahout/tree/master/viennacl-omp
https://github.com/apache/mahout/tree/master/viennacl=20

6)=09Also experience of Netflix regarding this question could be considered=
: [Distributed Neural Networks with GPUs in the AWS Cloud|http://techblog.n=
etflix.com/search/label/CUDA]  =20

This is considered as master ticket for GPU related ticktes


> Support GPU calculations
> ------------------------
>
>                 Key: FLINK-5782
>                 URL: https://issues.apache.org/jira/browse/FLINK-5782
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.3.0
>            Reporter: Kate Eri
>            Priority: Minor
>
> This ticket was initiated as continuation of the dev discussion thread: [=
New Flink team member - Kate Eri (Integration with DL4J topic)|http://mail-=
archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser] =20
> Recently we have proposed the idea to integrate [Deeplearning4J|https://d=
eeplearning4j.org/index.html] with Apache Flink.=20
> It is known that DL models training is resource demanding process, so tra=
ining on CPU could converge much longer than on GPU. =20
> But not only for DL training GPU usage could be supposed, but also for op=
timization of graph analytics and other typical data manipulations, nice ov=
erview of GPU related problems is presented [Accelerating Spark workloads u=
sing GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-usi=
ng-gpus].
> Currently the community pointed the following issues to consider:
> 1)=09Flink would like to avoid to write one more time its own GPU support=
, to reduce engineering burden. That=E2=80=99s why such libraries like [ND4=
J|http://nd4j.org/userguide]  should be considered.=20
> 2)=09Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to=
 optimize linear algebra calculations, ND4J can=E2=80=99t be integrated as =
is, because it still doesn=E2=80=99t support [sparse arrays|http://nd4j.org=
/userguide#faq]. Maybe this issue should be simply contributed to ND4J to e=
nable its usage?
> 3)=09The calculations would have to work with both available and not avai=
lable GPUs. If the system detects that GPUs are available, then ideally it =
would exploit them. Thus GPU resource management could be incorporated in [=
FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only suggeste=
d).
> 4)=09It was mentioned that as far Flink takes care of shipping data aroun=
d the cluster, also it will perform its dump out to GPU for calculation and=
 load back up. In practice, the lack of a persist method for intermediate r=
esults makes this troublesome (not because of GPUs but for calculating any =
sort of complex algorithm we expect to be able to cache intermediate result=
s).
> That=E2=80=99s why the Ticket [FLINK-1730|https://issues.apache.org/jira/=
browse/FLINK-1730] must be implemented to solve such problem. =20
> 5)=09Also it was recommended to take a look at Apache Mahout, at least to=
 get the experience with  GPU integration and check its
> https://github.com/apache/mahout/tree/master/viennacl-omp
> https://github.com/apache/mahout/tree/master/viennacl=20
> 6)  For now, GPU proposed only for batch calculations optimization, to su=
pport GPU for streaming should be started another ticket, because optimizat=
ion of streaming by GPU requires additional research. =09
> 7) Also experience of Netflix regarding this question could be considered=
: [Distributed Neural Networks with GPUs in the AWS Cloud|http://techblog.n=
etflix.com/search/label/CUDA]  =20
> This is considered as master ticket for GPU related ticktes


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)