ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Николай Ижиков" <nizhikov....@gmail.com>
Subject Re: Integration of Spark and Ignite. Prototype.
Date Fri, 24 Nov 2017 13:42:02 GMT
Hello, Val, Denis.

 > Personally, I think that we should release the integration only after the strategy is
fully supported.

I see two major reason to propose merge of DataFrame API implementation without custom strategy:

1. My PR is relatively huge, already. From my experience of interaction with Ignite community
- the bigger PR becomes, the more time of commiters required to review PR.
So, I propose to move smaller, but complete steps here.

2. It is not clear for me what exactly includes "custom strategy and optimization".
Seems, that additional discussion required.
I think, I can put my thoughts on the paper and start discussion right after basic implementation
is done.

 > Custom strategy implementation is actually very important for this integration.

Understand and fully agreed.
I'm ready to continue work in that area.

23.11.2017 02:15, Denis Magda пишет:
> Val, Nikolay,
> 
> Personally, I think that we should release the integration only after the strategy is
fully supported. Without the strategy we don’t really leverage from Ignite’s SQL engine
and introduce redundant data movement between Ignite and Spark nodes.
> 
> How big is the effort to support the strategy in terms of the amount of work left? 40%,
60%, 80%?
> 
> —
> Denis
> 
>> On Nov 22, 2017, at 2:57 PM, Valentin Kulichenko <valentin.kulichenko@gmail.com>
wrote:
>>
>> Nikolay,
>>
>> Custom strategy implementation is actually very important for this
>> integration. Basically, it will allow to create a SQL query for Ignite and
>> execute it directly on the cluster. Your current implementation only adds a
>> new DataSource which means that Spark will fetch data in its own memory
>> first, and then do most of the work (like joins for example). Does it make
>> sense to you? Can you please take a look at this and provide your thoughts
>> on how much development is implied there?
>>
>> Current code looks good to me though and I'm OK if the strategy is
>> implemented as a next step in a scope of separate ticket. I will do final
>> review early next week and will merge it if everything is OK.
>>
>> -Val
>>
>> On Thu, Oct 19, 2017 at 7:29 AM, Николай Ижиков <nizhikov.dev@gmail.com>
>> wrote:
>>
>>> Hello.
>>>
>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two Catalog
>>> implementations and what is the difference?
>>>
>>> IgniteCatalog removed.
>>>
>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to be
>>> set manually on SQLContext each time it's created....Is there any way to
>>> automate this and improve usability?
>>>
>>> IgniteStrategy and IgniteOptimization are removed as it empty now.
>>>
>>>> Actually, I think it makes sense to create a builder similar to
>>> SparkSession.builder()...
>>>
>>> IgniteBuilder added.
>>> Syntax looks like:
>>>
>>> ```
>>> val igniteSession = IgniteSparkSession.builder()
>>>     .appName("Spark Ignite catalog example")
>>>     .master("local")
>>>     .config("spark.executor.instances", "2")
>>>     .igniteConfig(CONFIG)
>>>     .getOrCreate()
>>>
>>> igniteSession.catalog.listTables().show()
>>> ```
>>>
>>> Please, see updated PR - https://github.com/apache/ignite/pull/2742
>>>
>>> 2017-10-18 20:02 GMT+03:00 Николай Ижиков <nizhikov.dev@gmail.com>:
>>>
>>>> Hello, Valentin.
>>>>
>>>> My answers is below.
>>>> Dmitry, do we need to move discussion to Jira?
>>>>
>>>>> 1. Why do we have org.apache.spark.sql.ignite package in our codebase?
>>>>
>>>> As I mentioned earlier, to implement and override Spark Catalog one have
>>>> to use internal(private) Spark API.
>>>> So I have to use package `org.spark.sql.***` to have access to private
>>>> class and variables.
>>>>
>>>> For example, SharedState class that stores link to ExternalCatalog
>>>> declared as `private[sql] class SharedState` - i.e. package private.
>>>>
>>>>> Can these classes reside under org.apache.ignite.spark instead?
>>>>
>>>> No, as long as we want to have our own implementation of ExternalCatalog.
>>>>
>>>>> 2. IgniteRelationProvider contains multiple constants which I guess are
>>>> some king of config options. Can you describe the purpose of each of them?
>>>>
>>>> I extend comments for this options.
>>>> Please, see my commit [1] or PR HEAD:
>>>>
>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two Catalog
>>>> implementations and what is the difference?
>>>>
>>>> Good catch, thank you!
>>>> After additional research I founded that only IgniteExternalCatalog
>>>> required.
>>>> I will update PR with IgniteCatalog remove in a few days.
>>>>
>>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op. What are
>>>> our plans on implementing them? Also, what exactly is planned in
>>>> IgniteOptimization and what is its purpose?
>>>>
>>>> Actually, this is very good question :)
>>>> And I need advice from experienced community members here:
>>>>
>>>> `IgniteOptimization` purpose is to modify query plan created by Spark.
>>>> Currently, we have one optimization described in IGNITE-3084 [2] by you,
>>>> Valentin :) :
>>>>
>>>> “If there are non-Ignite relations in the plan, we should fall back to
>>>> native Spark strategies“
>>>>
>>>> I think we can go little further and reduce join of two Ignite backed
>>>> Data Frames into single Ignite SQL query. Currently, this feature is
>>>> unimplemented.
>>>>
>>>> *Do we need it now? Or we can postpone it and concentrates on basic Data
>>>> Frame and Catalog implementation?*
>>>>
>>>> `Strategy` purpose, as you correctly mentioned in [2], is transform
>>>> LogicalPlan into physical operators.
>>>> I don’t have ideas how to use this opportunity. So I think we don’t need
>>>> IgniteStrategy.
>>>>
>>>> Can you or anyone else suggest some optimization strategy to speed up SQL
>>>> query execution?
>>>>
>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to be
>>>> set manually on SQLContext each time it's created....Is there any way to
>>>> automate this and improve usability?
>>>>
>>>> These classes added to `extraOptimizations` when one using
>>>> IgniteSparkSession.
>>>> As far as I know, there is no way to automatically add these classes to
>>>> regular SparkSession.
>>>>
>>>>> 6. What is the purpose of IgniteSparkSession? I see it's used in
>>>> IgniteCatalogExample but not in IgniteDataFrameExample, which is Confusing.
>>>>
>>>> DataFrame API is *public* Spark API. So anyone can provide implementation
>>>> and plug it into Spark. That’s why IgniteDataFrameExample doesn’t need
any
>>>> Ignite specific session.
>>>>
>>>> Catalog API is *internal* Spark API. There is no way to plug custom
>>>> catalog implementation into Spark [3]. So we have to use
>>>> `IgniteSparkSession` that extends regular SparkSession and overrides links
>>>> to `ExternalCatalog`.
>>>>
>>>>> 7. To create IgniteSparkSession we first create IgniteContext. Is it
>>>> really needed? It looks like we can directly provide the configuration
>>>> file; if IgniteSparkSession really requires IgniteContext, it can create
it
>>>> by itself under the hood.
>>>>
>>>> Actually, IgniteContext is base class for Ignite <-> Spark integration
>>>> for now. So I tried to reuse it here. I like the idea to remove explicit
>>>> usage of IgniteContext.
>>>> Will implement it in a few days.
>>>>
>>>>> Actually, I think it makes sense to create a builder similar to
>>>> SparkSession.builder()...
>>>>
>>>> Great idea! I will implement such builder in a few days.
>>>>
>>>>> 9. Do I understand correctly that IgniteCacheRelation is for the case
>>>> when we don't have SQL configured on Ignite side?
>>>>
>>>> Yes, IgniteCacheRelation is Data Frame implementation for a key-value
>>>> cache.
>>>>
>>>>> I thought we decided not to support this, no? Or this is something else?
>>>>
>>>> My understanding is following:
>>>>
>>>> 1. We can’t support automatic resolving key-value caches in
>>>> *ExternalCatalog*. Because there is no way to reliably detect key and value
>>>> classes.
>>>>
>>>> 2. We can support key-value caches in regular Data Frame implementation.
>>>> Because we can require user to provide key and value classes explicitly.
>>>>
>>>>> 8. Can you clarify the query syntax in IgniteDataFrameExample#nativeS
>>>> parkSqlFromCacheExample2?
>>>>
>>>> Key-value cache:
>>>>
>>>> key - java.lang.Long,
>>>> value - case class Person(name: String, birthDate: java.util.Date)
>>>>
>>>> Schema of data frame for cache is:
>>>>
>>>> key - long
>>>> value.name - string
>>>> value.birthDate - date
>>>>
>>>> So we can select data from data from cache:
>>>>
>>>> SELECT
>>>>   key, `value.name`,  `value.birthDate`
>>>> FROM
>>>>   testCache
>>>> WHERE key >= 2 AND `value.name` like '%0'
>>>>
>>>> [1] https://github.com/apache/ignite/pull/2742/commits/faf3ed6fe
>>>> bf417bc59b0519156fd4d09114c8da7
>>>> [2] https://issues.apache.org/jira/browse/IGNITE-3084?focusedCom
>>>> mentId=15794210&page=com.atlassian.jira.plugin.system.issuet
>>>> abpanels:comment-tabpanel#comment-15794210
>>>> [3] https://issues.apache.org/jira/browse/SPARK-17767?focusedCom
>>>> mentId=15543733&page=com.atlassian.jira.plugin.system.issuet
>>>> abpanels:comment-tabpanel#comment-15543733
>>>>
>>>>
>>>> 18.10.2017 04:39, Dmitriy Setrakyan пишет:
>>>>
>>>> Val, thanks for the review. Can I ask you to add the same comments to the
>>>>> ticket?
>>>>>
>>>>> On Tue, Oct 17, 2017 at 3:20 PM, Valentin Kulichenko <
>>>>> valentin.kulichenko@gmail.com> wrote:
>>>>>
>>>>> Nikolay, Anton,
>>>>>>
>>>>>> I did a high level review of the code. First of all, impressive results!
>>>>>> However, I have some questions/comments.
>>>>>>
>>>>>> 1. Why do we have org.apache.spark.sql.ignite package in our codebase?
>>>>>> Can
>>>>>> these classes reside under org.apache.ignite.spark instead?
>>>>>> 2. IgniteRelationProvider contains multiple constants which I guess
are
>>>>>> some king of config options. Can you describe the purpose of each
of
>>>>>> them?
>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two Catalog
>>>>>> implementations and what is the difference?
>>>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op. What
are
>>>>>> our
>>>>>> plans on implementing them? Also, what exactly is planned in
>>>>>> IgniteOptimization and what is its purpose?
>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to
be
>>>>>> set
>>>>>> manually on SQLContext each time it's created. This seems to be very
>>>>>> error
>>>>>> prone. Is there any way to automate this and improve usability?
>>>>>> 6. What is the purpose of IgniteSparkSession? I see it's used
>>>>>> in IgniteCatalogExample but not in IgniteDataFrameExample, which
is
>>>>>> confusing.
>>>>>> 7. To create IgniteSparkSession we first create IgniteContext. Is
it
>>>>>> really
>>>>>> needed? It looks like we can directly provide the configuration file;
if
>>>>>> IgniteSparkSession really requires IgniteContext, it can create it
by
>>>>>> itself under the hood. Actually, I think it makes sense to create
a
>>>>>> builder
>>>>>> similar to SparkSession.builder(), it would be good if our APIs here
are
>>>>>> consistent with Spark APIs.
>>>>>> 8. Can you clarify the query syntax
>>>>>> inIgniteDataFrameExample#nativeSparkSqlFromCacheExample2?
>>>>>> 9. Do I understand correctly that IgniteCacheRelation is for the
case
>>>>>> when
>>>>>> we don't have SQL configured on Ignite side? I thought we decided
not to
>>>>>> support this, no? Or this is something else?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> -Val
>>>>>>
>>>>>> On Tue, Oct 17, 2017 at 4:40 AM, Anton Vinogradov <
>>>>>> avinogradov@gridgain.com>
>>>>>> wrote:
>>>>>>
>>>>>> Sounds awesome.
>>>>>>>
>>>>>>> I'll try to review API & tests this week.
>>>>>>>
>>>>>>> Val,
>>>>>>> Your review still required :)
>>>>>>>
>>>>>>> On Tue, Oct 17, 2017 at 2:36 PM, Николай Ижиков
<
>>>>>>> nizhikov.dev@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Yes
>>>>>>>>
>>>>>>>> 17 окт. 2017 г. 2:34 PM пользователь "Anton
Vinogradov" <
>>>>>>>> avinogradov@gridgain.com> написал:
>>>>>>>>
>>>>>>>> Nikolay,
>>>>>>>>>
>>>>>>>>> So, it will be able to start regular spark and ignite
clusters and,
>>>>>>>>>
>>>>>>>> using
>>>>>>>
>>>>>>>> peer classloading via spark-context, perform any DataFrame
request,
>>>>>>>>> correct?
>>>>>>>>>
>>>>>>>>> On Tue, Oct 17, 2017 at 2:25 PM, Николай Ижиков
<
>>>>>>>>>
>>>>>>>> nizhikov.dev@gmail.com>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hello, Anton.
>>>>>>>>>>
>>>>>>>>>> An example you provide is a path to a master *local*
file.
>>>>>>>>>> These libraries are added to the classpath for each
remote node
>>>>>>>>>>
>>>>>>>>> running
>>>>>>>
>>>>>>>> submitted job.
>>>>>>>>>>
>>>>>>>>>> Please, see documentation:
>>>>>>>>>>
>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/
>>>>>>>>>> spark/SparkContext.html#addJar(java.lang.String)
>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/
>>>>>>>>>> spark/SparkContext.html#addFile(java.lang.String)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2017-10-17 13:10 GMT+03:00 Anton Vinogradov <
>>>>>>>>>>
>>>>>>>>> avinogradov@gridgain.com
>>>>>>>
>>>>>>>> :
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nikolay,
>>>>>>>>>>>
>>>>>>>>>>> With Data Frame API implementation there are
no requirements to
>>>>>>>>>>>>
>>>>>>>>>>> have
>>>>>>>>
>>>>>>>>> any
>>>>>>>>>>
>>>>>>>>>>> Ignite files on spark worker nodes.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What do you mean? I see code like:
>>>>>>>>>>>
>>>>>>>>>>> spark.sparkContext.addJar(MAVEN_HOME +
>>>>>>>>>>> "/org/apache/ignite/ignite-core/2.3.0-SNAPSHOT/ignite-
>>>>>>>>>>> core-2.3.0-SNAPSHOT.jar")
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 16, 2017 at 5:22 PM, Николай
Ижиков <
>>>>>>>>>>>
>>>>>>>>>> nizhikov.dev@gmail.com>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello, guys.
>>>>>>>>>>>>
>>>>>>>>>>>> I have created example application to run
Ignite Data Frame on
>>>>>>>>>>>>
>>>>>>>>>>> standalone
>>>>>>>>>>
>>>>>>>>>>> Spark cluster.
>>>>>>>>>>>> With Data Frame API implementation there
are no requirements to
>>>>>>>>>>>>
>>>>>>>>>>> have
>>>>>>>>
>>>>>>>>> any
>>>>>>>>>>
>>>>>>>>>>> Ignite files on spark worker nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> I ran this application on the free dataset:
ATP tennis match
>>>>>>>>>>>>
>>>>>>>>>>> statistics.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> data - https://github.com/nizhikov/atp_matches
>>>>>>>>>>>> app - https://github.com/nizhikov/ignite-spark-df-example
>>>>>>>>>>>>
>>>>>>>>>>>> Valentin, do you have a chance to look at
my changes?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2017-10-12 6:03 GMT+03:00 Valentin Kulichenko
<
>>>>>>>>>>>> valentin.kulichenko@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>> :
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Nikolay,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for delay on this, got a little
swamped lately. I will
>>>>>>>>>>>>>
>>>>>>>>>>>> do
>>>>>>
>>>>>>> my
>>>>>>>>
>>>>>>>>> best
>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>>>>
>>>>>>>>>>>>> review the code this week.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Val
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 9, 2017 at 11:48 AM, Николай
Ижиков <
>>>>>>>>>>>>>
>>>>>>>>>>>> nizhikov.dev@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, Valentin.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did you have a chance to look at
my changes?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Now I think I have done almost all
required features.
>>>>>>>>>>>>>> I want to make some performance test
to ensure my
>>>>>>>>>>>>>>
>>>>>>>>>>>>> implementation
>>>>>>>
>>>>>>>> work
>>>>>>>>>>
>>>>>>>>>>> properly with a significant amount of data.
>>>>>>>>>>>>>> And I definitely need some feedback
for my changes.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2017-10-09 18:45 GMT+03:00 Николай
Ижиков <
>>>>>>>>>>>>>>
>>>>>>>>>>>>> nizhikov.dev@gmail.com
>>>>>>>>
>>>>>>>>> :
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> Hello, guys.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Which version of Spark do we
want to use?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1. Currently, Ignite depends
on Spark 2.1.0.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      * Can be run on JDK 7.
>>>>>>>>>>>>>>>      * Still supported: 2.1.2
will be released soon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. Latest Spark version is 2.2.0.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      * Can be run only on JDK
8+
>>>>>>>>>>>>>>>      * Released Jul 11, 2017.
>>>>>>>>>>>>>>>      * Already supported by huge
vendors(Amazon for
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> example).
>>>>>>
>>>>>>>
>>>>>>>>>>>>>>> Note that in IGNITE-3084 I implement
some internal Spark
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> API.
>>>>>>
>>>>>>> So It will take some effort to switch between Spark 2.1 and
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2.2
>>>>>>>
>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2017-09-27 2:20 GMT+03:00 Valentin
Kulichenko <
>>>>>>>>>>>>>>> valentin.kulichenko@gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will review in the next few
days.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Val
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Sep 26, 2017 at 2:23
PM, Denis Magda <
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> dmagda@apache.org
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello Nikolay,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is good news. Finally
this capability is coming to
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ignite.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>>>> Val, Vladimir, could
you do a preliminary review?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Answering on your questions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Yardstick should be
enough for performance
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> measurements.
>>>>>>>
>>>>>>>> As a
>>>>>>>>>
>>>>>>>>>> Spark
>>>>>>>>>>>>
>>>>>>>>>>>>> user, I will be curious to know what’s
the point of this
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> integration.
>>>>>>>>>>>>
>>>>>>>>>>>>> Probably we need to compare Spark + Ignite
and Spark +
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hive
>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>>> Spark +
>>>>>>>>>>>>
>>>>>>>>>>>>> RDBMS cases.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2. If Spark community
is reluctant let’s include the
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> module
>>>>>>>
>>>>>>>> in
>>>>>>>>
>>>>>>>>> ignite-spark integration.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> —
>>>>>>>>>>>>>>>>> Denis
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sep 25, 2017, at 11:14
AM, Николай Ижиков <
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> nizhikov.dev@gmail.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello, guys.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Currently, I’m
working on integration between Spark
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> and
>>>>>>
>>>>>>> Ignite
>>>>>>>>>
>>>>>>>>>> [1].
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For now, I implement
following:
>>>>>>>>>>>>>>>>>>     * Ignite DataSource
implementation(
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> IgniteRelationProvider)
>>>>>>>>>
>>>>>>>>>>     * DataFrame support for Ignite SQL table.
>>>>>>>>>>>>>>>>>>     * IgniteCatalog
implementation for a transparent
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> resolving
>>>>>>>>>
>>>>>>>>>> of
>>>>>>>>>>
>>>>>>>>>>> ignites
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> SQL tables.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Implementation of
it can be found in PR [2]
>>>>>>>>>>>>>>>>>> It would be great
if someone provides feedback for a
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> prototype.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I made some examples
in PR so you can see how API
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> suppose
>>>>>>>
>>>>>>>> to
>>>>>>>>
>>>>>>>>> be
>>>>>>>>>>
>>>>>>>>>>> used [3].
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [4].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I need some advice.
Can you help me?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. How should this
PR be tested?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Of course, I need
to provide some unit tests. But what
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> about
>>>>>>>>
>>>>>>>>> scalability
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> tests, etc.
>>>>>>>>>>>>>>>>>> Maybe we need some
Yardstick benchmark or similar?
>>>>>>>>>>>>>>>>>> What are your thoughts?
>>>>>>>>>>>>>>>>>> Which scenarios should
I consider in the first place?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2. Should we provide
Spark Catalog implementation
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> inside
>>>>>>
>>>>>>> Ignite
>>>>>>>>>>
>>>>>>>>>>> codebase?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A current implementation
of Spark Catalog based on
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *internal
>>>>>>>>
>>>>>>>>> Spark
>>>>>>>>>>>
>>>>>>>>>>>> API*.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Spark community seems
not interested in making Catalog
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> API
>>>>>>>
>>>>>>>> public
>>>>>>>>>>>
>>>>>>>>>>>> or
>>>>>>>>>>>>
>>>>>>>>>>>>> including Ignite Catalog in Spark code
base [5], [6].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *Should we include
Spark internal API implementation
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> inside
>>>>>>>>
>>>>>>>>> Ignite
>>>>>>>>>>>
>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> base?*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Or should we consider
to include Catalog
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> implementation
>>>>>>
>>>>>>> in
>>>>>>>
>>>>>>>> some
>>>>>>>>>>
>>>>>>>>>>> external
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> module?
>>>>>>>>>>>>>>>>>> That will be created
and released outside Ignite?(we
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> still
>>>>>>>
>>>>>>>> can
>>>>>>>>>
>>>>>>>>>> support
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> develop it inside
Ignite community).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>>>>>>>>>>>>>>>>>> [2] https://github.com/apache/ignite/pull/2742
>>>>>>>>>>>>>>>>>> [3] https://github.com/apache/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff-
>>>>>>>>
>>>>>>>>> f4ff509cef3018e221394474775e0905
>>>>>>>>>>>>>>>>>> [4] https://github.com/apache/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff-
>>>>>>>>
>>>>>>>>> f2b670497d81e780dfd5098c5dd8a89c
>>>>>>>>>>>>>>>>>> [5] http://apache-spark-developers-list.1001551.n3.
>>>>>>>>>>>>>>>>>> nabble.com/Spark-Core-Custom-
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Catalog-Integration-between-
>>>>>>>
>>>>>>>> Apache-Ignite-and-Apache-Spark-td22452.html
>>>>>>>>>>>>>>>>>> [6] https://issues.apache.org/jira/browse/SPARK-17767
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Nikolay Izhikov
>>>>>>>>>>>>>>>>>> NIzhikov.dev@gmail.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Nikolay Izhikov
>>>>>>>>>>>>>>> NIzhikov.dev@gmail.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Nikolay Izhikov
>>>>>>>>>>>>>> NIzhikov.dev@gmail.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Nikolay Izhikov
>>>>>>>>>>>> NIzhikov.dev@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Nikolay Izhikov
>>>>>>>>>> NIzhikov.dev@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>> --
>>> Nikolay Izhikov
>>> NIzhikov.dev@gmail.com
>>>
> 

Mime
View raw message