db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (DERBY-6211) Make Optimizer trace logic pluggable.
Date Tue, 11 Jun 2013 16:58:21 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680468#comment-13680468
] 

Rick Hillegas edited comment on DERBY-6211 at 6/11/13 4:57 PM:
---------------------------------------------------------------

Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an optimizer tracer which
produces its output in xml. In addition, this patch adds an optional tool for viewing that
xml output using SQL. I am running tests now.

I find that it is very hard to read the existing optimizer traces for the following reasons:

i) The trace output is not indented to indicate how facts relate to one another.

ii) In particular, it is hard to piece together the shapes of the query plans which are being
evaluated.

I hope that this xml output is easier to read and understand. The output contains elements
which nest like the corresponding optimizer data structures:

A) statement - This is the text of an SQL statement which needs optimization.

B) queryBlock - A statement may have many query blocks. For instance, a UNION statement contains
many branches, each of which is its own query block. Most subqueries are also independent
query blocks. Each query block is optimized in isolation from the others. The isolation goes
so far that each query block gets its own optimizer instance.

C) joinOrder - For each query block, the optimizer may consider several join orders. These
are the left-to-right orders in which tables would be accessed at execution time. The optimizer
builds up a join order incrementally, starting from the leftmost position and adding more
tables as it goes. The optimizer may abandon a join order before it is completely filled out.
This happens if the optimizer determines that no completion of the join order can result in
a plan that's cheaper than the best plan found so far.

D) decoration - As the optimizer fills out the join order, it also considers which conglomerate
to use for each table and how to join the table to the table to its left. Of course, the leftmost
table doesn't join to anything before it, so for the leftmost table the join strategy is always
NESTED_LOOP.

The following other elements appear in the xml output:

E) planCost - The optimizer evaluates the costs of decorated join orders, including the costs
of decorated but partial join orders. The planCost element nests inside the joinOrder element.
In addition, each queryBlock contains a best planCost.

F) decConglomerateCost - This represents the cost of using a particular conglomerate to scan
a table. This element nests inside the decoration element.

In addition to presenting cost information, the planCost element presents two descriptions
of the decorated join order being evaluated:

1) summary - This is meant to be a compact, precise description of the plan which we might
consider using in an optimizer override. This description identifies conglomerates by the
(often cryptic) names stored in SYS.SYSCONGLOMERATES.

2) verbose - This is meant to be a more human-readable description of the plan. Tables are
identified by their SQL names or by their correlation names in the query. In addition, index
column names are included if the table is accessed by an index.

Although the optimizer considers how tables join leftward, English speakers will want to view
the join order rightward. That is how the descriptions are written. In addition, I have introduced
the following infix operators to represent the join strategies:

*               NESTED_LOOP
#               HASH_JOIN

I have also fully parenthesized the plan descriptions even though Derby only supports left-deep
parentheses today. In the future, Derby may support bushy join orders, requiring different
parenthesizing. Putting all of this together, here is a sample summary description:

((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) # 67bb80b4-013f-33ba-d007-000003789be8)
# d8cd40ba-013f-33ba-d007-000003789be8

...and here is the corresponding verbose description:

((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4

For the following query...

select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, sys.syssequences
s
where t.tablename = c.columnname and c.columnname = a.alias and a.alias = s.sequencename

...here's a sample verbose plan description:

((C # T{TABLENAME,SCHEMAID}) # A{SCHEMAID,ALIAS,NAMESPACE}) # S{SCHEMAID,SEQUENCENAME}

I think that this xml output is much easier to read than traditional Derby optimizer traces.
If you use a browser like Firefox, you can collapse and expand elements in order to expose
just the information you want to see.

This patch also includes an optional tool (optimizerTracingViews) which gives you a SQL view
of all of the planCost elements in the xml output. Here's an example of how to use xml optimizer
tracing along with this optional viewing tool. Note that the tracer wants a file name argument
but the viewer wants a file url argument:

-- turn on xml-based optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

-- run this query through the tracer
select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a and tab3.a
= tab4.a;

-- turn off optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' );

-- load the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'file:///Users/rh161140/derby/mainline/z.xml'
);

-- view the costs of all complete plans
select estimatedCost, verbose from planCost
where complete
order by estimatedCost
;

-- unload the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

Here is the output of the query against the planCost view:

ESTIMATEDCOST           |VERBOSE                                                         
                                                               
------------------------------------------------------------
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                   
                                                               
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                   
                                                               
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
18224.740429176647      |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                   
                                                               
18224.740429176647      |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                   
                                                               
27806.645417006024      |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                   
                                                               
27806.645417006024      |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                   
                                                               
27903.296430890066      |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                   
                                                               
27903.296430890066      |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                   
                                                               

Here is the full shape of the planCost view:

(
    text varchar( 32672 ),
    stmtID    int,
    qbID   int,
    complete  boolean,
    summary   varchar( 32672 ),
    verbose   varchar( 32672 ),
    type        varchar( 50 ),
    estimatedCost        double,
    estimatedRowCount    bigint
)

I think this functionality is useful enough now that other people can test-drive it. Before
writing regression tests for this patch, I would like to gather feedback from developers about
how to improve this basic functionality. For instance, is this readable enough? Is there some
information from optimizer traces which you often use but which is missing from the xml output?

Further improvements could include:

I) Adding more information to the xml trace. Right now, I have only implemented a subset of
the trace methods.

II) Adding more SQL views for reading the xml trace output.


Touches the following files:

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
M       java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

Changes to the signatures of some of the trace methods so that they give the xml tracer the
information it needs.

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java

Each JoinStrategy now has an operator symbol for use in planCost descriptions.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

The new xml trace logic.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java
M       java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
M       tools/jar/extraDBMSclasses.properties

The new optional tool for viewing xml trace output.

                
      was (Author: rhillegas):
    Attaching derby-6211-05-aa-xmlOptimizerTracer.diff. This patch adds an optimizer tracer
which produces its output in xml. In addition, this patch adds an optional tool for viewing
that xml output using SQL. I am running tests now.

I find that it is very hard to read the existing optimizer traces for the following reasons:

i) The trace output is not indented to indicate how facts relate to one another.

ii) In particular, it is hard to piece together the shapes of the query plans which are being
evaluated.

I hope that this xml output is easier to read and understand. The output contains elements
which nest like the corresponding optimizer data structures:

A) statement - This is the text of an SQL statement which needs optimization.

B) queryBlock - A statement may have many query blocks. For instance, a UNION statement contains
many branches, each of which is its own query block. Most subqueries are also independent
query blocks. Each query block is optimized in isolation from the others. The isolation goes
so far that each query block gets its own optimizer instance.

C) joinOrder - For each query block, the optimizer may consider several join orders. These
are the left-to-right orders in which tables would be accessed at execution time. The optimizer
builds up a join order incrementally, starting from the leftmost position and adding more
tables as it goes. The optimizer may abandon a join order before it is completely filled out.
This happens if the optimizer determines that no completion of the join order can result in
a plan that's cheaper than the best plan found so far.

D) decoration - As the optimizer fills out the join order, it also considers which conglomerate
to use for each table and how to join the table to the table to its left. Of course, the leftmost
table doesn't join to anything before it, so for the leftmost table the join strategy is always
NESTED_LOOP.

The following other elements appear in the xml output:

E) planCost - The optimizer evaluates the costs of decorated join orders, including the costs
of decorated but partial join orders. The planCost element nests inside the joinOrder element.
In addition, each queryBlock contains a best planCost.

F) decConglomerateCost - This represents the cost of using a particular conglomerate to scan
a table. This element nests inside the decoration element.

In addition to presenting cost information, the planCost element presents two descriptions
of the decorated join order being evaluated:

1) summary - This is meant to be a compact, precise description of the plan which we might
consider using in an optimizer override. This description identifies conglomerates by the
(often cryptic) names stored in SYS.SYSCONGLOMERATES.

2) verbose - This is meant to be a more human-readable description of the plan. Tables are
identified by their SQL names or by their correlation names in the query. In addition, index
column names are included if the table is accessed by an index.

Although the optimizer considers how tables join leftward, English speakers will want to view
the join order rightward. That is how the descriptions are written. In addition, I have introduced
the following infix operators to represent the join strategies:

*               NESTED_LOOP
#               HASH_JOIN

I have also fully parenthesized the plan descriptions even though Derby only supports left-deep
parentheses today. In the future, Derby may support bushy join orders, requiring different
parenthesizing. Putting all of this together, here is a sample summary description:

((45b300a8-013f-33ba-d007-000003789be8 # b6b2c0ae-013f-33ba-d007-000003789be8) # 67bb80b4-013f-33ba-d007-000003789be8)
# d8cd40ba-013f-33ba-d007-000003789be8

...and here is the corresponding verbose description:

((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4

For the following query...

select tablename from sys.systables t, sys.syscolumns c, sys.sysaliases a, sys.syssequences
s
where t.tablename = c.columnname and c.columnname = a.alias and a.alias = s.sequencename

...here's a sample verbose plan description:

((C # T{TABLENAME,SCHEMAID}) # A{SCHEMAID,ALIAS,NAMESPACE}) # S{SCHEMAID,SEQUENCENAME}

I think that this xml output is much easier to read than traditional Derby optimizer traces.
If you use a browser like Firefox, you can collapse and expand elements in order to expose
just the information you want to see.

This patch also includes an optional tool (optimizerTracingViews) which gives you a SQL view
of all of the planCost elements in the xml output. Here's an example of how to use xml optimizer
tracing along with this optional viewing tool. Note that the tracer wants a file name argument
but the viewer wants a file url argument:

-- turn on xml-based optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', true, 'xml' );

-- run this query through the tracer
select * from tab1, tab2, tab3, tab4 where -tab1.b = tab2.b and tab2.a = tab3.a and tab3.a
= tab4.a;

-- turn off optimizer tracing
call syscs_util.syscs_register_tool( 'optimizerTracing', false, 'z.xml' );

-- load the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', true, 'file:///Users/rh161140/derby/mainline/z.xml'
);

-- view the costs of all complete plans
select estimatedCost, verbose from planCost
where complete
order by estimatedCost
;

-- unload the trace viewer
call syscs_util.syscs_register_tool( 'optimizerTracingViews', false );

Here is the output of the query against the planCost view:

ESTIMATEDCOST           |VERBOSE                                                         
                                                               
------------------------------------------------------------
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                   
                                                               
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
8595.596000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB4) # APP.TAB3                   
                                                               
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
9114.756000000001       |((APP.TAB1 # APP.TAB2) # APP.TAB3) # APP.TAB4                   
                                                               
18224.740429176647      |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                   
                                                               
18224.740429176647      |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                   
                                                               
27806.645417006024      |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                   
                                                               
27806.645417006024      |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                   
                                                               
27903.296430890066      |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                   
                                                               
27903.296430890066      |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB4 # APP.TAB3) # APP.TAB2) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB4 # APP.TAB2) # APP.TAB3) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB3 # APP.TAB4) # APP.TAB2) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB3 # APP.TAB2) # APP.TAB4) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB2 # APP.TAB4) # APP.TAB3) * APP.TAB1                   
                                                               
28336.34888640299       |((APP.TAB2 # APP.TAB3) # APP.TAB4) * APP.TAB1                   
                                                               


I think this functionality is useful enough now that other people can test-drive it. Before
writing regression tests for this patch, I would like to gather feedback from developers about
how to improve this basic functionality. For instance, is this readable enough? Is there some
information from optimizer traces which you often use but which is missing from the xml output?

Further improvements could include:

I) Adding more information to the xml trace. Right now, I have only implemented a subset of
the trace methods.

II) Adding more SQL views for reading the xml trace output.


Touches the following files:

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/OptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/DefaultOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/Level2OptimizerImpl.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerImpl.java
M       java/testing/org/apache/derbyTesting/functionTests/tests/lang/DummyOptTrace.java

Changes to the signatures of some of the trace methods so that they give the xml tracer the
information it needs.

------------------

M       java/engine/org/apache/derby/iapi/sql/compile/JoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/NestedLoopJoinStrategy.java
M       java/engine/org/apache/derby/impl/sql/compile/HashJoinStrategy.java

Each JoinStrategy now has an operator symbol for use in planCost descriptions.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/XMLOptTrace.java
M       java/engine/org/apache/derby/impl/sql/compile/OptimizerTracer.java

The new xml trace logic.

------------------

A       java/engine/org/apache/derby/impl/sql/compile/OptTraceViewer.java
M       java/engine/org/apache/derby/catalog/Java5SystemProcedures.java
M       tools/jar/extraDBMSclasses.properties

The new optional tool for viewing xml trace output.

                  
> Make Optimizer trace logic pluggable.
> -------------------------------------
>
>                 Key: DERBY-6211
>                 URL: https://issues.apache.org/jira/browse/DERBY-6211
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.11.0.0
>            Reporter: Rick Hillegas
>            Assignee: Rick Hillegas
>         Attachments: derby-6211-01-aa-createPlugin.diff, derby-6211-02-aa-cleanup.diff,
derby-6211-02-ab-cleanup.diff, derby-6211-03-aa-customTracer.diff, derby-6211-04-aa-moveOptimizerTracerToEngineJar.diff,
derby-6211-05-aa-xmlOptimizerTracer.diff
>
>
> Right now the trace logic in the optimizer is hard-coded to produce a stream of diagnostics.
It would be good to be able to plug alternative trace logic into the optimizer. This would
make the following possible:
> 1) Plug in trace logic which produces formats which are easier to study and which can
be analyzed mechanically. E.g., xml formatted output.
> 2) Plug in trace logic which can be used during unit testing to verify that the optimizer
has picked the right plan. Over time this might make it easier to migrate canon-based tests
to assertion-based tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message