spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reza Farivar (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-3785) Support off-loading computations to a GPU
Date Tue, 07 Oct 2014 22:13:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162662#comment-14162662
] 

Reza Farivar edited comment on SPARK-3785 at 10/7/14 10:13 PM:
---------------------------------------------------------------

Note: project Sumatra might become a part of Java 9, so we might get official GPU support
in Java some time in the future. 

Sean, I agree that the memory copying is an overhead, but for the right application it can
become small enough to ignore. Also, you can apply a series of operations on an RDD before
moving it back to the CPU land. Think rdd.map(x => sine(x)*x).filter( _ < 100).map(x=>
1/x)... The distributed nature of the RDD could mean we can run a whole stage in the GPU land,
with each task would run on a different GPU in the cluster not needing to get back in the
CPU land until we get to a collect() or groupBy(), etc. I imagine we can have a subclass of
ShuffleMapTask that lives in the GPU land and would call a GPU kernel when the runtask() is
called.

In fact, given that we have a good number of specialized RDDs, I think we could have specialized
GPU versions of them easily (say, the CartesianRDD for instance). Where it gets tougher is
in the mappedRDD function, where you would want to pass the arbitrary function to the GPU
and hope that it runs. 


was (Author: rfarivar):
Note: project Sumatra might become a part of Java 9, so we might get official GPU support
in Java some time in the future. 

Sean, I agree that the memory copying is an overhead, but for the right application it can
become small enough to ignore. Also, you can apply a series of operations on an RDD before
moving it back to the CPU land. Think rdd.map(x => sine(x)*x).filter( _ < 100).map(x=>
1/x)... The distributed nature of the RDD could mean we can run a whole stage in the GPU land,
with each task would run on different GPU in the cluster not needing to get back in the CPU
land until we get to a collect() or groupBy(), etc. I imagine we can have a subclass of ShuffleMapTask
that lives in the GPU land and would call a GPU kernel when the runtask() is called.

In fact, given that we have a good number of specialized RDDs, I think we could have specialized
GPU versions of them easily (say, the CartesianRDD for instance). Where it gets tougher is
in the mappedRDD function, where you would want to pass the arbitrary function to the GPU
and hope that it runs. 

> Support off-loading computations to a GPU
> -----------------------------------------
>
>                 Key: SPARK-3785
>                 URL: https://issues.apache.org/jira/browse/SPARK-3785
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: MLlib
>            Reporter: Thomas Darimont
>            Priority: Minor
>
> Are there any plans to adding support for off-loading computations to the GPU, e.g. via
an open-cl binding? 
> http://www.jocl.org/
> https://code.google.com/p/javacl/
> http://lwjgl.org/wiki/index.php?title=OpenCL_in_LWJGL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message