db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-6211) Make Optimizer trace logic pluggable.
Date Wed, 07 Aug 2013 13:46:48 GMT

     [ https://issues.apache.org/jira/browse/DERBY-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rick Hillegas updated DERBY-6211:
---------------------------------

    Attachment: derby-6211-12-aa-traceEndOfQueryBlock.diff

Attaching derby-6211-12-aa-traceEndOfQueryBlock.diff. This patch makes a couple improvements
to the xml-based optimizer tracing of outer join queries. Tests passed cleanly for me on this
patch.

The optimization of outer joins is interesting. In some cases, an outer join can be turned
into an inner join and optimized as a single query block. However, in other cases, the tables
in the outer join cannot be placed into a master join order with the other tables in the query.
The following query behaves as an inner join because the compiler sees that the "t3.c1 = t2.c1"
qualification will only be satisfied for rows in which t2 inner joined to t1 and therefore
t2's columns are not null:

a)
  select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t2.c1;

However, the following query may qualify rows in which t2 outer joined to t1 and t2's columns
are null. The following query cannot be turned into one big inner join:

b)
  select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t1.c1;

For query (b), the whole "(t1 left outer join t2 on t1.c1 = t2.c1)" clause is treated as a
separate query block. The situation is actually even more complicated than that, and we'll
get to that later. But at a high level, the optimizer starts out considering two join orders:

  [t3, QueryBock2]

and

  [QueryBlock2, t3]

While evaluating these join orders, the optimizer recurses and optimizes QueryBlock2. As you
can see, QueryBlock2 will be optimized twice.

This situation broke xml-based optimizer tracing in several ways:

1) The xml tracer died on an NPE when trying to represent the join order and give a compact
name to QueryBlock2. The fix was to give QueryBlock2 a generic name, viz., the class name
of the query tree node which sits on top of QueryBlock2.

2) Once that bug was fixed, it became apparent that there was no marker for the end of the
recursive optimization of QueryBlock2. This caused trace information to be added to the wrong
query block. The fix was to add some recursion to the xml tracer and to add a new trace method
which flags the end of an attempt to optimize a query block.

3) Once that bug was fixed, another variant of problem (1) surfaced: an NPE was raised trying
to give QueryBlock2 a compact name in the query plan summary. Again, that problem was fixed
by using the class name of the query tree node which sits on top of QueryBlock2.

As promised above, the situation is actually a bit more complicated. Remember that QueryBlock2
represents the following clause:

  (t1 left outer join t2 on t1.c1 = t2.c1)

The optimizer doesn't have many tricks for this outer join. It can't permute the join order.
t1 must be outer to t2. So, the optimizer treats this clause as two separate query blocks
and optimizes them separately. This happens because the HalfOuterJoinNode is really a TableOperatorNode,
just like a UNION. Each branch of the HalfOuterJoinNode is treated as its own query block
just as each branch of a UNION is treated as a separate, independent query block.

I hope that this explanation makes it easier to interpret the results of the following script.
You will notice two plans considered for query (b):

( ProjectRestrictNode * "APP"."SQL130807054111430" )

( "APP"."67bb80b4-0140-58cb-8920-00000383a238" * ProjectRestrictNode )

Here's the script:

connect 'jdbc:derby:memory:db;create=true';

create table t1
(
  c1 int,
  c2 int,
  c3 int,
  constraint cons1 primary key(c1, c2)
);

create table t2
(
  c1 int not null,
  c2 int not null,
  c3 int,
  constraint cons2 unique(c1, c2)
);

create table t3
(
  c1 int,
  c2 int,
  c3 int,
  constraint cons3 primary key(c1, c2)
);

select cast( t.tablename as varchar(2)), c.conglomeratename
from sys.systables t, sys.sysconglomerates c
where t.tableid = c.tableid
and t.tabletype = 'T'
order by t.tablename;

--turn on xml-based optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

select * from t1 left outer join t2 on t1.c1 = t2.c1;

select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t2.c1;

select * from t3, (t1 left outer join t2 on t1.c1 = t2.c1) where t3.c1 = t1.c1;

-- turn off optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'optimizerTrace.xml' );

-- load the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'optimizerTrace.xml' );

-- view the costs of all complete plans
select stmtID, qbID, estimatedCost, summary from planCost
where complete
order by stmtID, qbID, estimatedCost
;

-- unload the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );


Touches the following files:

---------------

M       java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java

Added recursive treatment of query blocks and fixed the NPEs.

---------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptimizerPlan.java

Added a new kind of node in the OptimizerPlan: DeadEnd. This is a node for cases when the
xml tracer can't peer inside an Optimizable. For instance, when the Optimizable is another
query block.

---------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
M       java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

Changed the name of the traceStart() method to traceStartQueryBlock(). Added a new traceEndQueryBlock()
method.

---------------

M       java/engine/org/apache/derby/impl/sql/compile/TableOperatorNode.java

Adds a traceEndQueryBlock() call after a recursive optimization finishes.

                
> Make Optimizer trace logic pluggable.
> -------------------------------------
>
>                 Key: DERBY-6211
>                 URL: https://issues.apache.org/jira/browse/DERBY-6211
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.11.0.0
>            Reporter: Rick Hillegas
>            Assignee: Rick Hillegas
>              Labels: derby_triage10_11
>         Attachments: derby-6211-01-aa-createPlugin.diff, derby-6211-02-aa-cleanup.diff,
derby-6211-02-ab-cleanup.diff, derby-6211-03-aa-customTracer.diff, derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff,
derby-6211-05-aa-xmlOptimizerTracer.diff, derby-6211-06-ab-packageProtect-XMLOptTrace.diff,
derby-6211-07-aa-useSchemaQualifiedNamesInSummaries.diff, derby-6211-07-ab-useSchemaQualifiedNamesInSummaries.diff,
derby-6211-08-aa-fixNPE.diff, derby-6211-09-aa-addTests.diff, derby-6211-10-aa-makingCostEstimateObject.diff,
derby-6211-11-aa-moveTracerOutOfOptimizer.diff, derby-6211-11-ab-moveTracerOutOfOptimizer.diff,
derby-6211-12-aa-traceEndOfQueryBlock.diff
>
>
> Right now the trace logic in the optimizer is hard-coded to produce a stream of diagnostics.
It would be good to be able to plug alternative trace logic into the optimizer. This would
make the following possible:
> 1) Plug in trace logic which produces formats which are easier to study and which can
be analyzed mechanically. E.g., xml formatted output.
> 2) Plug in trace logic which can be used during unit testing to verify that the optimizer
has picked the right plan. Over time this might make it easier to migrate canon-based tests
to assertion-based tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message