calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Integrating calcite into a sql processing pipeline
Date Sat, 04 Jul 2015 23:43:42 GMT
I think you hit the nail on the head.  Let's say that there are three main
capabilities of Calcite:

Parse > Plan > Execute

Hive uses Calcite for Plan only
Drill uses Calcite for Parse > Plan
Kylin uses Calcite for Parse > Plan > Execute (partially)

The APIs for each are different.  I would look at the implementation that
matches the set of things you want to take advantage of.  I know Drill code
best so you can see how we use Calcite in these places [1] and [2].  Note
that these are the entry points and we use the Frameworks API. There is way
more code than this that we use around rules, rels, expressions etc but
this is a good overview of how to hook into Calcite to Parse > Plan.

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
[2]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java



On Sat, Jul 4, 2015 at 4:19 PM, Adeel Qureshi <adeelmahmood@gmail.com>
wrote:

> I have gone through the documentation provided at the calcite website and
> the gist of what I got out of it was that I can pass a SQL statement to the
> calcite engine/framework and tell it how to read my data using the
> SchemaFactory, Schema, Table and Enumerator implementation classes and it
> will be able to apply that query to my dataset and return the results of
> that query. This makes sense and works as long as you want to completely
> hand off the responsibility of processing the SQL to calcite but there are
> cases where you want to control the processing pipeline and potentially do
> things differently to process the SQL while still allowing calcite to do
> the heavy lifting in terms of processing SQL. Here are some examples
>
> 1. My limited understanding of apache drill's query execution process is
> that it uses calcite to come up with the logical plan for a SQL query and I
> am not sure if it constructs the physical plan as well using calcite or
> that's something internal to drill. Either way the physical plan is then
> distributed into fragments of SQL for different drill nodes to process.
> Without getting too much into how drill works, I am mostly interested in
> how drill intervenes the calcite processing of taking the SQL and applying
> it directly on the data and returning results. Its almost like they only
> use calcite to come up with the plan and then take it from there. Are those
> APIs exposed and can someone point me to where I can find such code within
> calcite project. Basically to be able to control the complete process of
> taking SQL, coming up with logical or physical plan and then applying it on
> some dataset.
>
> 2. Another example is Hive which uses calcite to come up with cost based
> logical optimizer and essentially integrates that into its flow of
> processing the SQL instead of passing the SQL to calcite and letting it do
> all the processing. This seems to be the pattern how calcite project is
> being used by these other projects (in somewhat in direct way, not
> completely handing off the responsibility of processing SQL) but I have not
> been able to find any information on calcite site that can help with
> incorporating calcite into other projects.
>
> I would appreciate some insight into the matter. Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message