db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4) "order by" is not supported for "insert ... select"
Date Wed, 23 May 2007 04:56:16 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Bryan Pendleton updated DERBY-4:

    Attachment: samples.ij

It turns out that I had a major misunderstanding of the control
flow of INSERT ... SELECT and how it interacts with generated
identity columns.

Consider the original example from this issue:

  create table temp1(s varchar(10));
  insert into temp1 values 'x','a','c','b','a';
  create table temp2(i integer not null generated always as identity,
                   s varchar(10));
  insert into temp2(s) select s from temp1 order by s;

An INSERT ... SELECT statement has 3 basic stages of processing:
 - the SELECT statement retrieves rows of data
 - each retrieved row is transformed into the form needed by the
   target table
 - each row is then inserted into the target table

>From a QueryTreeNode point of view, there is an InsertNode, which
points to a NormalizeResultSetNode, which in turn points either to
a UnionNode (in the case of INSERT ... VALUES ...), or to a
ProjectRestrictNode/SelectNode tree (in the case of INSERT ... SELECT ...).

My assumption when I started investigating this issue was that
the values of the identity column were generated during the 3rd
stage, at the time of insert into the target table, by processing
that would occur in InsertResultSet.java

Or, at least, that the identity values were generated during
the 2nd stage, as part of projecting and normalizing the source
data into the form needed by the target table, by processing that
would occur in NormalizeResultSet.java.

Unfortunately, after stepping through the code in detail, I've
come to the conclusion that the identity column values are
actually generated during the first stage, when the rows of
data are retrieved by the SELECT and are being fed *into* the sorter.

Thus, the identity column's values are generated *prior* to the
sort, and hence there is no guarantee that the order of the
values bears any relationship to the ORDER BY clause.

At the instant when the new value of the identity column is
being generated for the row that we're processing, the stack looks
like this:
     <generated bytecode method "e1">

Note that we are still loading the sorter at this point; the sort
has not yet been performed.

My conclusion at this point is that I would have to make some much
more substantial changes to INSERT ... SELECT in order to get it
to perform the identity column generation *after* the ORDER BY
processing had occurred.

I'm attaching an updated version of the patch that I was working with,
and an updated version of the sample SQL statements that I was
trying to run. The patched code has some interesting behaviors; it
definitely performs the ORDER BY processing faithfully.

It just doesn't provide the identity column guarantees that
I was interested in exploring.

> "order by" is not supported for "insert ... select"
> ---------------------------------------------------
>                 Key: DERBY-4
>                 URL: https://issues.apache.org/jira/browse/DERBY-4
>             Project: Derby
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Christian d'Heureuse
>         Assigned To: Bryan Pendleton
>            Priority: Minor
>         Attachments: insertOrderBy.diff, insertOrderBy_v2.diff, insertOrderBy_v3.diff,
samples.ij, samples.ij
> When filling a table with "insert ... select ...", "order by" cannot be specified.
> There is not method to copy a table sorted into another table (except using export/import).
This would be useful to optimize performance for big tables, or to create identity values
that are ascending (related to another column).
> Example:
> create table temp1 (
>    s varchar(10));
> insert into temp1 values 'x','a','c','b','a';
> create table temp2 (
>    i integer not null
>       generated always as identity
>       primary key,
>    s varchar(10));
> insert into temp2 (s)
>    select s from temp1 order by s;
> --> Error: "order by" is not allowed.
> -- trying to use "group by" instead of "oder by":
> insert into temp2 (s)
>    select s from temp1 group by s;
> select * from temp2;
> --> "group by" did not sort the table.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message