ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Valentin Kulichenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-3084) Investigate how Ignite can support Spark DataFrame
Date Tue, 03 Jan 2017 05:36:58 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794210#comment-15794210

Valentin Kulichenko commented on IGNITE-3084:

I made some investigation and here is what in my view needs to be done to support integration
between Ignite and Spark DataFrame.

# Provide implementation of {{BaseRelation}} mixed with {{PrunedFilteredScan}}. It should
be able to execute a query based on provided filters and selected fields and return RDD that
iterates through results. Since RDD works on per partition level, most likely we will need
to add an ability to run SQL query on a particular partition.
# Provide implementation of {{Catalog}} to properly lookup Ignite relations.
# Create {{IgniteSQLContext}} that will override the catalog.

Steps above will add a new datasource to Spark. However generally, while Spark is executing
a query, it first fetches data from the source to its own memory to create RDDs. Therefore
this is not enough for Ignite because we already have data in memory. In case there is only
Ignite data participating in the query, we want Spark to issue a query directly to Ignite.

To accomplish this we can provide our own implementation of {{Strategy}} which Spark uses
to convert logical plan to physical plan. For any type of {{LogicalPlan}}, this custom strategy
should be able to generate SQL query for Ignite, based on the whole  plan tree. If there are
non-Ignite relations in the plan, we should fall back to native Spark strategies (return {{Nil}}
as a physical plan).

{{IgniteSQLContext}} should append the custom strategy to collection of Spark strategies.
Here is a good example of how custom strategy can be created and injected: https://gist.github.com/marmbrus/f3d121a1bc5b6d6b57b9

> Investigate how Ignite can support Spark DataFrame
> --------------------------------------------------
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: Ignite RDD
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Valentin Kulichenko
>              Labels: bigdata
>             Fix For: 2.0
> We see increasing demand on nice DataFrame support for our Spark integration. Need to
investigate how could we do that.
> Looks like we can investigate how MemSQL do that and take it as a starting point.

This message was sent by Atlassian JIRA

View raw message