ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolay Izhikov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-7077) Spark Data Frame Support. Strategy to convert complete query to Ignite SQL
Date Fri, 06 Apr 2018 04:25:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427949#comment-16427949
] 

Nikolay Izhikov commented on IGNITE-7077:
-----------------------------------------

[~vkulichenko]

First, thank you for paying attention to my task!

> I thought this task implied implementation of Strategy to convert Spark's logical plan
to physical plan that would be executed directly on Ignite as a SQL query. 

1. LogicalPlan is the representation of SQL query. 

2. Leave of LogicalPlan is wrapper for {{IgniteSQLRealtion}}.

3. As far as I can see we don't need to provide some custom {{SparkStrategy}}. At least for
a query optimization described in this issue.
Because {{SparkStrategy}} contains Spark 'physical' operations: how to select and process
data from the underlying data sources optimally.
This differs from {{LogicalPlan}} - tree representation of SQL query.

> Here I see the implementation of Optimization. Can you please clarify why is that and
what is the difference? How the current implementation work?

My implementation does the following:

1. Transform part(or the whole) LogicalPlan from the bottom to the up and pushes all possible
SQL operator to the Ignite accumulator. 
If operator pushed to the accumulator then tree node are removed from the plan, because we
execute it internally in Iginte.

2. If on some level unsupported operator founded(some UDF or similar) then this and upper
layers of LogicalPlan remains unchanged. 

3. Create SQL query from accumulator. It will be executed directly in Ignite.

4. Replace accumulator with a `IgniteSQLAccumulatorRelation` which is relation containing
resulting SQL query.

After step 4 we have a LogicalPlan that will execute all supported SQL operator directly in
Ignite.

> Also I think we should add some examples demonstrating the new functionality.

If it works correctly then we don't need any new examples, because implementation details
are hidden from the user.
Here what changed:

This how spark plan looks like *before* Ignite optimization:

{noformat}
== Analyzed Logical Plan ==
id: bigint, name: string
Sort [id#180L ASC NULLS FIRST], true
+- Project [id#180L, name#179]
   +- Filter (id#180L > cast(1 as bigint))
      +- SubqueryAlias city
         +- Relation[NAME#179,ID#180L] IgniteSQLRelation[table=CITY]
{noformat}

After optimization, we got following plan. Please note, that query inside IgniteSQLAccumulatorRelation
identical to the query representation from the previous plan.

{noformat}
== Optimized Logical Plan ==
Relation[ID#180L,NAME#179] IgniteSQLAccumulatorRelation(columns=[ID, NAME], qry=SELECT ID,
NAME FROM CITY WHERE id IS NOT NULL AND id > 1 ORDER BY id)
{noformat}

Please, see debug output of IgniteOptimization*Spec. {{IgniteOptimizationSpec}} or {{IgniteOptimizationStringFuncSpec}},
for example.
You can see additional examples of LogicalPlan transformation there.

> Spark Data Frame Support. Strategy to convert complete query to Ignite SQL
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-7077
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7077
>             Project: Ignite
>          Issue Type: New Feature
>          Components: spark
>    Affects Versions: 2.3
>            Reporter: Nikolay Izhikov
>            Assignee: Nikolay Izhikov
>            Priority: Major
>              Labels: bigdata
>             Fix For: 2.5
>
>
> Basic support of Spark Data Frame for Ignite implemented in IGNITE-3084.
> We need to implement custom spark strategy that can convert whole Spark SQL query to
Ignite SQL Query if query consists of only Ignite tables.
> The strategy does nothing if spark query includes not only Ignite tables.
> Memsql implementation can be taken as an example - https://github.com/memsql/memsql-spark-connector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message