calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adeel Qureshi <>
Subject Re: Integrating calcite into a sql processing pipeline
Date Mon, 06 Jul 2015 01:12:38 GMT
Thanks for the info. I will look into the code.

I'm a bit curious about how drill uses only the plan and parse stages. So
does drill extracts the parsed relational operators information from
calcite and then come up with a distributed plan to execute the query and
does it eventually passes the smaller query fragments to calcite for final
processing or does it uses the "parsed" information gathered from calcite
to do the query itself.

In case I get a comment from someone that this is becoming more drill
specific :) - I think a framework like calcite can probably be best
understood within the context of a project that uses it.


On Saturday, July 4, 2015, Jacques Nadeau <> wrote:

> I think you hit the nail on the head.  Let's say that there are three main
> capabilities of Calcite:
> Parse > Plan > Execute
> Hive uses Calcite for Plan only
> Drill uses Calcite for Parse > Plan
> Kylin uses Calcite for Parse > Plan > Execute (partially)
> The APIs for each are different.  I would look at the implementation that
> matches the set of things you want to take advantage of.  I know Drill code
> best so you can see how we use Calcite in these places [1] and [2].  Note
> that these are the entry points and we use the Frameworks API. There is way
> more code than this that we use around rules, rels, expressions etc but
> this is a good overview of how to hook into Calcite to Parse > Plan.
> [1]
> [2]
> On Sat, Jul 4, 2015 at 4:19 PM, Adeel Qureshi <
> <javascript:;>>
> wrote:
> > I have gone through the documentation provided at the calcite website and
> > the gist of what I got out of it was that I can pass a SQL statement to
> the
> > calcite engine/framework and tell it how to read my data using the
> > SchemaFactory, Schema, Table and Enumerator implementation classes and it
> > will be able to apply that query to my dataset and return the results of
> > that query. This makes sense and works as long as you want to completely
> > hand off the responsibility of processing the SQL to calcite but there
> are
> > cases where you want to control the processing pipeline and potentially
> do
> > things differently to process the SQL while still allowing calcite to do
> > the heavy lifting in terms of processing SQL. Here are some examples
> >
> > 1. My limited understanding of apache drill's query execution process is
> > that it uses calcite to come up with the logical plan for a SQL query
> and I
> > am not sure if it constructs the physical plan as well using calcite or
> > that's something internal to drill. Either way the physical plan is then
> > distributed into fragments of SQL for different drill nodes to process.
> > Without getting too much into how drill works, I am mostly interested in
> > how drill intervenes the calcite processing of taking the SQL and
> applying
> > it directly on the data and returning results. Its almost like they only
> > use calcite to come up with the plan and then take it from there. Are
> those
> > APIs exposed and can someone point me to where I can find such code
> within
> > calcite project. Basically to be able to control the complete process of
> > taking SQL, coming up with logical or physical plan and then applying it
> on
> > some dataset.
> >
> > 2. Another example is Hive which uses calcite to come up with cost based
> > logical optimizer and essentially integrates that into its flow of
> > processing the SQL instead of passing the SQL to calcite and letting it
> do
> > all the processing. This seems to be the pattern how calcite project is
> > being used by these other projects (in somewhat in direct way, not
> > completely handing off the responsibility of processing SQL) but I have
> not
> > been able to find any information on calcite site that can help with
> > incorporating calcite into other projects.
> >
> > I would appreciate some insight into the matter. Thanks.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message