calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Integrating calcite into a sql processing pipeline
Date Mon, 06 Jul 2015 01:21:15 GMT
Drill has its distributed execution framework.  As such, its physical
algebra and traits (physical properties) include distribution and
distributed physical operators.  The transformation between Calcite's world
and Drill's internal world can be seen in these classes:

Expression transformation [1]
Algebra transformation, custom implementations that extend [2],
specifically implementations of getPhysicalOperator()


[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
[2]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/Prel.java

On Sun, Jul 5, 2015 at 6:12 PM, Adeel Qureshi <adeelmahmood@gmail.com>
wrote:

> Thanks for the info. I will look into the code.
>
> I'm a bit curious about how drill uses only the plan and parse stages. So
> does drill extracts the parsed relational operators information from
> calcite and then come up with a distributed plan to execute the query and
> does it eventually passes the smaller query fragments to calcite for final
> processing or does it uses the "parsed" information gathered from calcite
> to do the query itself.
>
> In case I get a comment from someone that this is becoming more drill
> specific :) - I think a framework like calcite can probably be best
> understood within the context of a project that uses it.
>
> Adeel
>
> On Saturday, July 4, 2015, Jacques Nadeau <jacques@apache.org> wrote:
>
> > I think you hit the nail on the head.  Let's say that there are three
> main
> > capabilities of Calcite:
> >
> > Parse > Plan > Execute
> >
> > Hive uses Calcite for Plan only
> > Drill uses Calcite for Parse > Plan
> > Kylin uses Calcite for Parse > Plan > Execute (partially)
> >
> > The APIs for each are different.  I would look at the implementation that
> > matches the set of things you want to take advantage of.  I know Drill
> code
> > best so you can see how we use Calcite in these places [1] and [2].  Note
> > that these are the entry points and we use the Frameworks API. There is
> way
> > more code than this that we use around rules, rels, expressions etc but
> > this is a good overview of how to hook into Calcite to Parse > Plan.
> >
> > [1]
> >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java
> > [2]
> >
> >
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
> >
> >
> >
> > On Sat, Jul 4, 2015 at 4:19 PM, Adeel Qureshi <adeelmahmood@gmail.com
> > <javascript:;>>
> > wrote:
> >
> > > I have gone through the documentation provided at the calcite website
> and
> > > the gist of what I got out of it was that I can pass a SQL statement to
> > the
> > > calcite engine/framework and tell it how to read my data using the
> > > SchemaFactory, Schema, Table and Enumerator implementation classes and
> it
> > > will be able to apply that query to my dataset and return the results
> of
> > > that query. This makes sense and works as long as you want to
> completely
> > > hand off the responsibility of processing the SQL to calcite but there
> > are
> > > cases where you want to control the processing pipeline and potentially
> > do
> > > things differently to process the SQL while still allowing calcite to
> do
> > > the heavy lifting in terms of processing SQL. Here are some examples
> > >
> > > 1. My limited understanding of apache drill's query execution process
> is
> > > that it uses calcite to come up with the logical plan for a SQL query
> > and I
> > > am not sure if it constructs the physical plan as well using calcite or
> > > that's something internal to drill. Either way the physical plan is
> then
> > > distributed into fragments of SQL for different drill nodes to process.
> > > Without getting too much into how drill works, I am mostly interested
> in
> > > how drill intervenes the calcite processing of taking the SQL and
> > applying
> > > it directly on the data and returning results. Its almost like they
> only
> > > use calcite to come up with the plan and then take it from there. Are
> > those
> > > APIs exposed and can someone point me to where I can find such code
> > within
> > > calcite project. Basically to be able to control the complete process
> of
> > > taking SQL, coming up with logical or physical plan and then applying
> it
> > on
> > > some dataset.
> > >
> > > 2. Another example is Hive which uses calcite to come up with cost
> based
> > > logical optimizer and essentially integrates that into its flow of
> > > processing the SQL instead of passing the SQL to calcite and letting it
> > do
> > > all the processing. This seems to be the pattern how calcite project is
> > > being used by these other projects (in somewhat in direct way, not
> > > completely handing off the responsibility of processing SQL) but I have
> > not
> > > been able to find any information on calcite site that can help with
> > > incorporating calcite into other projects.
> > >
> > > I would appreciate some insight into the matter. Thanks.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message