ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhikov....@gmail.com>
Subject Re: Spark data frames integration merged
Date Fri, 05 Jan 2018 15:42:39 GMT
Hello, guys.

Currently `getPreferredLocations` implemented in 
`IgniteRDD -> IgniteAbstractRDD`.

But DataFrame implementation uses 
`IgniteSQLDataFrameRDD -> IgniteSqlRDD -> IgniteAbstractRDD`

Where `->` is extension.

So, for now, getPreferredLocation doesn't implemented for a
IgniteDataFrame.

Please, take a look [1], [2].

I think it a very good idea to implement `getPreferredLocation` inside
`IgniteSQLDataFrameRDD` or event inside `IgniteAbstractRDD`

Can someone file a ticket? Or I can do it by myself.


[1] - https://github.com/apache/ignite/blob/master/modules/spark/src/ma
in/scala/org/apache/ignite/spark/IgniteRDD.scala#L50

[2] - https://github.com/apache/ignite/blob/master/modules/spark/src/ma
in/scala/org/apache/ignite/spark/impl/IgniteSQLDataFrameRDD.scala#L40


В Ср, 03/01/2018 в 15:35 -0800, Valentin Kulichenko пишет:
> Revin,
> 
> I doubt IgniteRDD#getPrefferredLocations has any affect on data
> frames, but this is an interesting point. Nikolay, as a developer of
> this functionality, can you please comment on this?
> 
> -Val
> 
> On Wed, Jan 3, 2018 at 1:22 PM, Revin Chalil <rchalil@expedia.com>
> wrote:
> > Thanks Val for the info on indexes with DF. Do you know if adding
> > index / affinitykeys on the cache help with the join, when the
> > IgniteRDD is joined with a spark DF? The below from docs say that
> > 
> > “IgniteRDD also provides affinity information to Spark via
> > getPrefferredLocations method so that RDD computations use data
> > locality.”
> > 
> > I was wondering, if the affinitykey on the cache can be utilized in
> > the spark join?
> > 
> > 
> > On 1/3/18, 12:27 PM, "vkulichenko" <valentin.kulichenko@gmail.com>
> > wrote:
> > 
> >     Indexes would not be used during joins, at least in current
> > implementation.
> >     Current integration is implemented as a regular Spark data
> > source which
> >     provides each relation separately. Spark then performs join by
> > itself, so
> >     Ignite indexes do not help.
> > 
> >     The easiest way to get binaries would be to use a nightly build
> > [1] , but it
> >     seems to be broken for some reason (latest is from May 31). I
> > guess the only
> >     option at the moment is to build from source.
> > 
> >     [1]
> >     https://builds.apache.org/view/H-L/view/Ignite/job/Ignite-night
> > ly/lastSuccessfulBuild/
> > 
> >     -Val
> > 
> > 
> > 
> >     --
> >     Sent from: http://apache-ignite-users.70518.x6.nabble.com/
> > 
> > 
> 
> 

Mime
View raw message