ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Valentin Kulichenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite
Date Wed, 27 Dec 2017 01:08:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304114#comment-16304114

Valentin Kulichenko commented on IGNITE-3084:

[~NIzhikov], looks much better now, here are some points I want to reiterate on though.
* {{onApplicationEnd}} method - makes sense. But it sounds like it should be on {{IgniteContext}}
level, what do you think?
* {{IgniteSQLRelation#calcPartitions}} - got it, but what will happen if topology changes?
Will partitions be recalculated?
* {{IgniteCacheRelation}} - let's remove it for now and discuss on dev@ as a separate task.
If we come up with good API for this talk, then create a Jira ticket and implement. Although
I feel that there are more important tasks at the moment, like implementing custom strategy
for SQL execution.
* {{org.apache.spark.sql.ignite}} package. We currently have three classes there, and it looks
like only {{IgniteSparkSession}} is supposed to be used in application code. Can we move it
to {{org.apache.ignite.spark}} package and put it with all other public classes? {{IgniteExternalCatalog}}
and {{IgniteSharedState}} can then remain in this weird package, as they are implementation
only and not public.

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>            Priority: Critical
>              Labels: bigdata, important
>             Fix For: 2.4
> Apache Spark already benefits from integration with Apache Ignite. The latter provides
shared RDDs, an implementation of Spark RDD, that help Spark to share a state between Spark
workers and execute SQL queries much faster. The next logical step is to enable support for
modern Spark Data Frames API in a similar way.
> As a contributor, you will be fully in charge of the integration of Spark Data Frame
API and Apache Ignite.

This message was sent by Atlassian JIRA

View raw message