ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikolay Izhikov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IGNITE-3084) Spark Data Frames Support in Apache Ignite
Date Tue, 28 Nov 2017 10:33:00 GMT

    [ https://issues.apache.org/jira/browse/IGNITE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268523#comment-16268523

Nikolay Izhikov commented on IGNITE-3084:

We can't have IgniteCatalog for 2.1 version of spark.
So I propose to update spark dependencies for module {{spark}} to 2.2.0 in this task.

1. To setup IgniteCatalog we need to override `SharedState.externalCatalog` val. So spark
can lookup Ignite tables.
2. externalCatalog is null while SharedState instance initialized.  [https://docs.scala-lang.org/tutorials/FAQ/initialization-order.html]
3. externalCatalog is used in internal initializer - [SharedState.scala|https://github.com/apache/spark/blob/v2.1.2/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L96]
4. In 2.2.0 version SharedState.scala fixed in the way that allow override of externalCatalog
- [SharedState-2.2.0|https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L93]

    val defaultDbDefinition = CatalogDatabase(
      SessionCatalog.DEFAULT_DATABASE, "default database", warehousePath, Map())
    if (!externalCatalog.databaseExists(SessionCatalog.DEFAULT_DATABASE)) { // <-- Problem
is here! externalCatalog == null if we override it.
      externalCatalog.createDatabase(defaultDbDefinition, ignoreIfExists = true)

> Spark Data Frames Support in Apache Ignite
> ------------------------------------------
>                 Key: IGNITE-3084
>                 URL: https://issues.apache.org/jira/browse/IGNITE-3084
>             Project: Ignite
>          Issue Type: Task
>          Components: spark
>    Affects Versions: 1.5.0.final
>            Reporter: Vladimir Ozerov
>            Assignee: Nikolay Izhikov
>              Labels: bigdata
>             Fix For: 2.4
> Apache Spark already benefits from integration with Apache Ignite. The latter provides
shared RDDs, an implementation of Spark RDD, that help Spark to share a state between Spark
workers and execute SQL queries much faster. The next logical step is to enable support for
modern Spark Data Frames API in a similar way.
> As a contributor, you will be fully in charge of the integration of Spark Data Frame
API and Apache Ignite.

This message was sent by Atlassian JIRA

View raw message