ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolay Izhikov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite
Date Thu, 14 Dec 2017 10:07:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290644#comment-16290644

Nikolay Izhikov commented on IGNITE-3084:


> IgniteCacheRelation is questionable. Main problem is that it works with classes which
are not always available. 

With BinaryMarshaller Key and Values classes should be available only on master node.
I think it a very common case.

> Also what if schema is dynamic, how are we going to support it? 

If we can't divide objects based on common field(type, class or something similar) we can
query only common fields from all cache classes.

> I think it's better to support data frames only via Ignite SQL, unless we come up with
a cleaner solution. Let me know what you think.

As you know, I think Ignite should provide an ability to query key-value cache via generic,
widely used SQL interface.
I think that limitation you mentioned is natural to a user and can be just documented.

But, without a doubt, you are more experienced Ignite developer than me.
So If your decision is to exclude support of DataFrames for key-value - let's exclude it.

If we exclude that feature for now, should I create a separate ticket to discuss it in future?

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>            Priority: Critical
>              Labels: bigdata, important
>             Fix For: 2.4
> Apache Spark already benefits from integration with Apache Ignite. The latter provides
shared RDDs, an implementation of Spark RDD, that help Spark to share a state between Spark
workers and execute SQL queries much faster. The next logical step is to enable support for
modern Spark Data Frames API in a similar way.
> As a contributor, you will be fully in charge of the integration of Spark Data Frame
API and Apache Ignite.

This message was sent by Atlassian JIRA

View raw message