calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <>
Subject Re: Calcite - Spark Adapter
Date Sat, 05 Dec 2015 02:31:17 GMT
Victor Giannakouris - Salalidis wrote:
> At first I would like to introduce my self to the developers list. My name
> is Victor and I am an undergraduate computer science student. Currently I
> am doing some research on query optimization.

Nice to meet you!

> I am quite new to Calcite project and I am facing some issues. My major
> problem is how could I use calcite spark adapter programmaticaly? I cannot
> find such a tutorial or documentation.

The Spark adapter is not quite like the other adapters. Most adapters
(e.g. MongoDB or JDBC) have their own metadata, so they can tell
Calcite which tables they contain, by implementing the SchemaFactory
SPI. If you use one of those tables, Calcite will push down as much
processing as it can to execute in the source system.

Spark is an engine. It doesn't have its own data. Therefore Calcite
uses it to implement the operations after the data has left the data

Drill and Flink are also engines, as are Calcite's native "enumerable"
physical relational operators, so I would like to generalize the
"engine" interface so that you can plug in the engine of your choice,
regardless of where you are reading the data from.

The other thing to be said about Spark is that Calcite's adapter is
based on an old version of Spark (0.9.0-incubating).

> Second, how can I run queries on SparkSQL via sqline?

SparkSQL doesn't need Calcite - it has its own parser and planner.
(Not as good as Calcite, obviously!) If I understand SparkSQL's
architecture correctly, the only piece that is missing is a JDBC
driver. Someone from the Spark community could implement a JDBC
driver, and Avatica would be a good way to do that. They can use
Avatica to implement the JDBC APIs and as the RPC layer, and they
don't need to pull in anything else from Calcite.

wangzhenhua wrote:
> I'm also interested in how calcite can be integrated into sparksql so that spark can
use calcite to optimize queries?
> Do we have such APIs? If not, is there any roadmap to do this?

I don't think you could plug Calcite into SparkSQL very easily. But
you could use Calcite as a SQL front end and an optimizer. That is
what Calcite's Spark adapter/engine does. We should enhance it to read
whatever metadata source SparkSQL uses (so that we are seeing the same
set of tables as SparkSQL) but no one is working on that right now.


View raw message