ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Izhikov <nizhi...@apache.org>
Subject [DISCUSSION] Spark Data Frame through Thin Client
Date Sat, 20 Oct 2018 21:09:13 GMT
Hello, Igniters.

Currently, Spark Data Frame integration implemented via client node connection.
Whenever we need to retrieve some data into Spark worker(or master) from Ignite we start a
client node.

It has several major disadvantages:

	1. We should copy whole Ignite distribution on to each Spark worker [1]
	2. We should copy whole Ignite distribution on to Spark master to get catalogue works.
	3. We should have the same absolute path to Ignite configuration file on every worker and
provide it during data frame construction [2]
	4. We should additionally configure Spark workerks classpath to include Ignite libraries.

For now, almost all operation we need to do in Spark Data Frame integration is supported by
Java Thin Client.
	* obtain the list of caches.
	* get cache configuration.
	* execute SQL query.
	* stream data to the table - don't support by the thin client for now, but can be implemented
using simple SQL INSERT statements.

Advantages of usage Java Thin Client in Spark integration(they all known from Java Thin Client
advantages):
	1. Easy to configure: only IP addresses of server nodes are required.
	2. Easy to deploy: only 1 additional jar required. No server side(Ignite worker) configuration
required.

I propose to implement Spark Data Frame integration through Java Thin Client.

Thoughts?

[1] https://apacheignite-fs.readme.io/docs/installation-deployment
[2] https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-dataframe-options

Mime
View raw message