ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@gridgain.com>
Subject Re: Basic Spark integration question
Date Thu, 11 Feb 2016 07:04:55 GMT

Please see inline

On 2/11/2016 8:48 AM, thesrc2016 wrote:
> Hi, I'm a new to Ignite and just trying to get my head round how exactly it
> can integrate with Spark.
> I have been looking through the overview and the diagrams for IgniteRDD but
> still things are a little unclear to me.
> I guess my query comes down to - can Ignite RDD simply be configured for use
> by Spark as its internal RDD implementation, given IgniteRDD implements the
> RDD abstraction?  The examples given appear to require explicit coding in
> order to be able to make use of IgniteRDD...
Ignite RDD is a special implementation of Spark RDD abstraction that 
allows to keep results of Spark jobs in memory and reuse the results by 
another Spark jobs.
Underneath Ignite RDD is based on Ignite Cache [1]  thus you have to 
obtain a reference to Ignite RDD in a special way [2]

val  igniteContext  =  new  IgniteContext[Integer,Integer](sparkContext,

val  cacheRdd  =  igniteContext.fromCache("partitioned")

Once obtained you can work with this RDD using basic RDD API.

> In my use case I want to access Spark capabilities through its SparkR API,
> and I want to accelerate processing with Spark's DataFrame SQL context
> available through this API so that it can use Ignite's in-memory indexing
> and other in-memory capabilities.
Here you should use IgniteRDD's 'sql' and 'objectSql' methods [3]

val  result  =  cacheRdd.sql(
   "select _val from Integer where val > ? and val < ?",10,100)

Indexing is configured using Ignite's CacheConfiguration [3]

>   I'd also like to persist the loaded RDD in
> memory between different Spark Application sessions in order to speed up
> start-up and also share pre-loaded in-memory RDDs between different R
> applications. Is any of this possible in relation to how Ignite is
> implemented or intended to operate?
As I mentioned above Ignite RDD is based on Ignite Cache that will 
distribute and persist data across available cluster nodes.

Does it make sense to you?

> Thanks!

[1] https://apacheignite.readme.io/docs/data-grid
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Basic-Spark-integration-question-tp2944.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.

View raw message