db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Griffin <...@roadrunner.com>
Subject Re: SQL/MED
Date Fri, 29 Feb 2008 15:21:23 GMT
Rick Hillegas wrote:
> 1) This could grow arbitrarily complicated, depending on how much
> heavy lifting you want the optimizer to do for you. With a little
> luck, you could probably trick the optimizer into treating the foreign
> tables like Table Functions. That would skirt some of the interesting
> optimizer issues.
> 2) A later implementation could get the optimizer involved and you
> could build some interesting cost model for remote tables.
> 3) Alternatively, you might be able to get an optimizer expert like
> Army involved early on. He might be able to help you divide and
> conquer this one.

The Derby and SQL/MED optimizer support look pretty similar.  SQL/MED
has GetReplyFirstCost(), GetReplyExecCost(), and GetReplyReExecCost()
interface functions that the wrapper (i.e. Table Function) supplies to
the server.  IIRC Table Functions have the same concepts.

For the initial pass, I sort of assumed that the ability to push
predicates to a foreign server capable of handling them would minimize
the effect of aggressive optimization within Derby, no ?

> You might want to start a wiki page on this topic and take it from there.

Will do.  Is there a page template I should use ?  Or a project proposal
page format used elsewhere ?

As a quick summary, here's the way I *think* SQL/MED works, based on
some initial reading....

In passthrough mode, the server (Derby) can send the SQL for the
subquery directly to the wrapper; this obviously only works for foreign
SQL servers.  In decomposition mode, the server parses the SQL, and
separates out the subqueries directed at tables controlled by the
foreign wrapper.  The server makes interface routines available to the
wrapper which allow it to navigate the subquery parse tree in the
SQL/MED format.

There are almost no data structures involved in this.  It's all
navigational interface calls which return individual bits of
information.  That means that the SQL/MED parse model and the server
parse model don't have to agree, as long as the server can supply the
information requested by the interface routines.

Conceptually, there is something called a Value Expression that
corresponds to a syntax token (simple or complex) in the subquery.  If
the token is complex, the VE has child VEs, and VEs get broken down to
child VEs until only the four basic VE types --- COLUMN NAME, CONSTANT,
OPERATOR, and PARAMETER --- are involved.  The SQL/MED model appears to
assume that all of these VEs, whether for tokens in a SELECT list,
expressions in a WHERE or ORDER BY, or whatever, are identified and
strung out in linear fashion, sort of like an array, at least for
Replies (see below).  The interface calls allow you to request the Nth
VE of a certain type, e.g. the Nth item in the SELECT list or the Nth
ORDER BY item.

The server provides the wrapper with a Request Handle that represents a
subquery.  The wrapper navigates through the subquery VEs and creates a
Reply Handle which is conceptually a copy of the Request with the VEs
the wrapper can't handle removed.  Just as the server provides
navigational routines to the wrapper which allow it to walk the Request,
the wrapper must provide a parallel set of navigational routines to the
server to allow it to walk the Reply.  I'm not yet sure why, but the
Request VEs are represented by "handles", while the Reply VEs are
represented by ordinal numbers (see the array reference above);
otherwise the set of navigational routines for Requests is pretty much a
mirror image of the set for Replies.

The server is responsible for detecting any differences between the
Request and Reply, and changing its execution plan to pick up the
wrapper's slack (so to speak). It then requests the wrapper to Open the
various Replies involved, and Iterates through them, using concepts very
similar to JDBC ResultSet, and incorporating the retrieved data into its
execution plan very much like data supplied by a Table Function.

I see this affecting several areas:

1) Derby DDL will need to be extended to include the various CREATE
FOREIGN XXXX statements.  I'd need to be pointed to the changes that
were made to add CREATE FUNCTION support for Table Functions.

2) Navigational routines need to be added which expose the Derby parse
tree for subqueries to wrappers.  This should be new code, but I'd need
to be pointed to code or docs for the parse tree model.

3) Derby support for incorporating wrappers into its execution plans
*should* be quite similar to whatever was done for Table Functions.  The
main difference will be new code to  analyze a Reply using the wrapper's
navigational routines, compare it to the original Request, and remove
any Derby predicate handling that was successfully pushed to the wrapper.

4) Data Access while iterating through a ResultSet is conceptually the
same as JDBC, but the mechanics are not identical, since SQL/MED was
designed to be language-neutral and Java isn't one of the languages
specifically mentioned.  There will probably need to be new access code
written to retrieve rows/columns from wrappers, and I'd need to know
where to insert this.

While I intend to make the architecture extensible to the full spec, I
was thinking of limiting support in the first pass to SELECTs only.  I
think the spec allows any sort of SQL to be passed to a wrapper, but I
haven't gotten that far yet.

View raw message