db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-481) implement SQL generated columns
Date Mon, 27 Oct 2008 16:32:44 GMT

     [ https://issues.apache.org/jira/browse/DERBY-481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rick Hillegas updated DERBY-481:

    Attachment: derby-481-04-aa-insert.diff

Attaching derby-481-04-aa-insert.diff. I am running regression tests now.

This patch wires in INSERT support for generated columns. I threaded my way through the INSERT
machinery largely by following the way that CHECK constraints are handled.

Before this patch, the compiler built 2 significant methods for evaluating expressions:

1) A method which populates the base row from whatever data source is driving the INSERT.
That data source could be, for instance, a list of literal values or a SELECT statement.

2) A method which runs the CHECK constraints.

My first attempt to support INSERT involved building the generation clauses into method (1).
Unfortunately, that method is generated by the data sources, not by the driving INSERT node.
I got this approach to work for the degenerate case of inserting a single literal value. But
this approach failed when I tried to insert multiple literal values (where the data source
is a UNION) and it failed when the data source was a SELECT. It became apparent that this
approach would involve wiring code-generation logic into all implementations of ResultSet--there
are quite a few. This began to look too complicated so I abandoned this approach.

The current patch represents a second attempt. Here the approach is to give the generation
clauses their own method. Now the compiler builds 3 significant methods for evaluating expressions:

1') The original method which populates the base row from a data source (see above).

2') A new method which runs the generation clauses, looking for referenced columns in the
row built by (1') and poking the generated values into that row.

3') The original method which runs the CHECK constraints (see above).

That was the tricky bit for compilation.

The tricky bit for execution was this: the base row has to be poked into the Activation so
that it is visible to the generation clauses when (2') runs. A similar poking is done for
CHECK constraints. If you examine this poking for CHECK constraints, you will notice that
sometimes the poking is undone after the constraints run and sometimes we don't bother to
undo the poking. I don't understand the difference between these code paths. As a result,
I have defensively coded the new poking which we need for generated columns. I poke the base
row into the Activation just before the generation clauses run. After the generation clauses
run, I return the Activation to its previous state.

Here is a little more detail on the implementation:

A) At bind() time we do the following:

i) Prune out explicit mentions of generated columns. These can arise if the user sets a generated
column to the literal DEFAULT--as allowed by the ANSI/ISO syntax. So for instance, the following
is legal:

  insert into T( refCol, generatedCol ) values ( 1, default )

We prune out the explicitly added generated columns because, later on in the bind() phase,
the insert list is expanded to include all columns with defaults (not just generated columns).

ii) When the insert list is expanded to include all defaulted columns, we add in the generated
columns but we don't bind their expressions. This is because the generation clause may refer
to other columns in the base row. This, in turn, creates an ordering problem. In addition
we we don't yet have a result set number for the base row--we need that number in order to
bind references to other columns which may appear in the generation clauses.

iii) Later on, just before we parse and bind the CHECK constraints, we parse and bind the
generation clauses. At this point, we have enough context to bind the referenced columns.

B) At generate() time, we generate method (2') in between generating (1') and (3'). The generated
(2') method is now one of the arguments to the factory method which creates the execution-driver,
the InsertResultSet. This is just like what we do for CHECK constraints: the generated (3')
method is also an argument to the instantiation of the InsertResultSet.

C) At execution time, we evaluate (2') just before we evaluate (3').

Touches the following files:


M      java/engine/org/apache/derby/impl/sql/compile/ResultColumn.java

Adds a method so that a ResultColumn can report whether it represents a generated column.
I also forced all overrides of the expression field to go through the setExpression() method.
This, technically speaking, is not necessary--but it made debugging easier for me and I think
it will be useful for other developers who need to debug this node.

M      java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java

Changes are made to support both binding and code-generation. These are the bind() changes:

i) Adds a method to object if the user tries to override the value in a generated column with
any value other than the DEFAULT literal. For instance, the following is illegal:

  insert into T( refCol, generatedCol ) values ( 1, 70 )

In addition, we remove explicit mentions of generated columns because we will add them back
when we enhance the INSERT statement with defaulted columns.

ii) Adds logic to parse and bind generated columns. This is modelled on the logic which parses
and binds CHECK constraints.

iii) Renames bindCheckConstraint() to bindRowScopedExpression() because this method is now
shared by the logic which binds CHECK constraints and the logic which binds generation clauses.

M      java/engine/org/apache/derby/impl/sql/compile/ResultSetNode.java

Short-circuits the logic which enhances the base row with defaulted columns. Adds in the generated
columns but does not add their generation clauses. This is because the clauses cannot be bound
at the same time as the rest of the columns in the base row. We wait to bind them until the
time that we bind CHECK constraints.

M      java/engine/org/apache/derby/impl/sql/compile/InsertNode.java

Wires binding and code-generation calls into bindStatement() and generate().


M      java/engine/org/apache/derby/impl/sql/compile/ResultColumnList.java

Skips code-generation for generated columns when walking the base row. The generateCore()
method generates (1'). We need to build the generation clauses into (2') instead and this
is done later on.

M      java/engine/org/apache/derby/impl/sql/compile/DMLModStatementNode.java

In addition to the bind() changes described above, adds logic to generate the (2') method.


M      java/engine/org/apache/derby/iapi/sql/execute/ResultSetFactory.java
M      java/engine/org/apache/derby/impl/sql/execute/GenericResultSetFactory.java

Adds (2') as an argument to the factory method which instantiates InsertResultSets.

M      java/engine/org/apache/derby/iapi/sql/Activation.java
M      java/engine/org/apache/derby/impl/sql/execute/BaseActivation.java
M      java/engine/org/apache/derby/impl/sql/GenericActivationHolder.java

Adds a method for retrieving the current row from the Activation. This allows us to return
the Activation to its original state after we have run (2').

M      java/engine/org/apache/derby/impl/sql/execute/InsertResultSet.java
M      java/engine/org/apache/derby/impl/sql/execute/NoRowsResultSetImpl.java

Evaluates generation clauses close to where CHECK constraints are evaluated.

M      java/testing/org/apache/derbyTesting/functionTests/tests/lang/GeneratedColumnsTest.java

Uncomments basic INSERT tests.

> implement SQL generated columns
> -------------------------------
>                 Key: DERBY-481
>                 URL: https://issues.apache.org/jira/browse/DERBY-481
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions:
>            Reporter: Rick Hillegas
>            Assignee: Rick Hillegas
>         Attachments: derby-481-00-aa-prototype.diff, derby-481-01-aa-catalog.diff, derby-481-02-aa-utilities.diff,
derby-481-03-aa-grammar.diff, derby-481-04-aa-insert.diff, GeneratedColumns.html
> Satheesh has pointed out that generated columns, a SQL 2003 feature, would satisfy the
performance requirements of Expression Indexes (bug 455). Generated columns may not be as
elegant as Expression Indexes, but they are easier to implement. We would allow the following
new kind of column definition in CREATE TABLE and ALTER TABLE statements:
>     columnName GENERATED ALWAYS AS ( expression )
> If expression were an indexableExpression (as defined in bug 455), then we could create
indexes on it. There is no work for the optimizer to do here. The Language merely has to compute
the generated column at INSERT/UPDATE time.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message