db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3880) NPE on a query with having clause involving a join
Date Fri, 31 Oct 2008 03:16:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644199#action_12644199

A B commented on DERBY-3880:

I looked a bit more at this issue and I think I've discovered at least part of the problem.
 A query like:

  select t1.i, avg(t2.i)
    from t1 inner join t2 on (t1.i = t2.i)
    group by t1.i having avg(t2.i) > 0;

has two aggregates in it, which means the compile time query tree has two AggregateNodes,
each of which has an "operand" that is a column reference.  The result column to which the
operand points is determined at bind time, and in this case the result column "t2.i" points
to a bind-time JoinNode.  The JoinNode's bind-time result column list is simply a concatenation
of the result column lists from its operands.  In our example, the query specifies T1 followed
by T2, so the JoinNode's bind-time RCL is: { T1.i, T1.c, T2.i, T2.c2, T2.i2 }.  Notice how,
with respect to this list, the position of "t2.i" is "3".  So when we bind the AggregateNodes,
each of them ends up with an operand that is a ColumnReference which points to the "t2.i"
column of the bind-time JoinNode, which in turn has virtual column id of "3".

After binding is complete we start optimization, the first phase of which is preprocessing.
 As part of SelectNode preprocessing we look at the FromList to see if there are any JoinNodes
that can be flattened.  In the above query the JoinNode *can* be flattened, which means it
effectively disappears.  That in turn means that any result columns in the SELECT list have
to be updated to reflect the fact that they no longer point to the JoinNode; instead, they
point directly to the underlying base tables.  The process of flattening the JoinNode and
updating the SelectNode's result column list begins with the following call in SelectNode.preprocess():

    // Flatten any flattenable FromSubquerys or JoinNodes

Notice how we pass the SelectNode's RCL into the flattenFromTables method, but we do *not*
pass the HAVING clause.  Long story short, the AggregateNode in the SelectNode's RCL is properly
updated to reflect the removal of the JoinNode--but the AggregateNode in the HAVING clause
continues to point to a result column from the now-useless JoinNode.  The *real* column to
which the Aggregate should be pointing is actually buried *beneath* the JoinNode's ResultColumn,
so the aggregate's operand should have been remapped to point to the buried column. (And indeed,
that's exactly what happens with the aggregate in the SelectNode's RCL.)

When optimization is done and we modify the tree, result columns are updated to reflect the
final join order and also to remove any un-referenced columns.  But since the HAVING aggregate
node is still pointing to the unused JoinNode result column, it doesn't see any changes--it
continues to point to a ResultColumn that has a virtual column id of "3" and which points
to a JoinNode that is no longer valid.

The final query plan inclues a *new* JoinNode between T1 and T2, and that JoinNode's RCL only
contains columns which are referenced in the query.  Thus the execution-time JoinNode result
column list is, for the above query, { T1.i, T2.i }.  Note how it only has TWO columns in
it, not five like the bind-time JoinNode.  So when it comes time to execute the query, the
aggregate in the having clause ends up pointing to column "3" of the new JoinNode's RCL, and
since that doesn't exist, the column value is set to null (as Kathey noted in earlier comments).
 Attempts to reference that column for GROUP BY sorting, then, result in the NPE.

The reason the same query works if the two aggregates reference two _different_ columns in
T2 has nothing to do with equivalence--that was a red herring, sorry for the miscue.  The
reason the query passes appears to have something to do with referenced columns--esp. the
SelectNode (which includes the Select RCL and the Having clause) then references TWO columns
from T2 plus ONE column from T1, leading to THREE columns in the execution JoinNode's RCL:
{ T1.i, T2.i, T2.i2 }.  So now the RCL has three columns instead of two, which means the result
column in the HAVING aggregate, which still has the INCORRECT virtual id of "3", accidentally
points to a valid column, and thus the query passes--but that was just chance, the code is
still technically wrong.

This also explains why the original query passes if the user does NOT specify "inner join".
 The inclusion of "inner join" leads to the creation of the bind-time JoinNode between T1
and T2, against which the aggregates are then bound.  In preprocessing we realize the JoinNode
is useless so we flatten it out of the query, and that causes the aforementioned problem.
 But if the query does not have "inner join", the bind-time JoinNode is never created, which
means both aggregate nodes already point directly to the underlying base tables.  Throughout
optimization and query generation the base table result columns are properly updated, which
means both aggregates end up pointing to the correct place, and all is well.

So all of that said, I think the fix for this problem is to somehow ensure that the HAVING
clause's AggregateNode is correctly remapped when/if necessary, esp. if it points to a JoinNode
that has been flattened out.  I happened to notice that, when the JoinNode is flattened, all
of its result columns are marked as "redundant" as part of the flattening process (see JoinNode.flatten()).
 So the quick (and possibly NOT complete) fix that I found was to add an "if" statement to
the getNewExpressionResultColumn() method of AggregateNode.java.  That method is called as
part of "modification of access paths"--i.e. after optimization has completed--and is used
for creating the result column that will ultimately be used for aggregate input at execution
time.  I found that if I added logic to remap the Aggregate's operand (if it's a column reference),
then the posted query starts running without error:

Index: java/engine/org/apache/derby/impl/sql/compile/AggregateNode.java
--- java/engine/org/apache/derby/impl/sql/compile/AggregateNode.java    (revision 707244)
+++ java/engine/org/apache/derby/impl/sql/compile/AggregateNode.java    (working copy)
@@ -539,6 +539,26 @@
                        this.getNewNullResultExpression() :

 +        /* The operand for this aggregate node was initialized at bind
 +         * time.  Between then and now it's possible that certain changes
 +         * have been made to the query tree which affect this operand. In
 +         * particular, if the operand was pointing to a result column in
 +         * a JoinNode and then that JoinNode was flattened during pre-
 +         * processing, all of the references to that JoinNode--including
 +         * this aggregate's operand--need to be updated to reflect the
 +         * fact that the Join Node no longer exists.  So check to see if
 +         * the operand is a column reference, and if so, make a call to
 +         * remap it to its underlying expression.  If nothing has happened
 +         * then this will be a no-op; but if something has changed to void
 +         * out the result column to which the operand points, the result
 +         * column will be marked "redundant" and the following call should
 +         * remap as appropriate. DERBY-3880.
 +         */
 +        if (operand instanceof ColumnReference)
 +        {
 +            ((ColumnReference)operand).remapColumnReferencesToExpressions();
 +        }
            return (ResultColumn) getNodeFactory().getNode(
                                                                "##aggregate expression",

I didn't run any other tests, and I admit that this change may not be sufficient (or it may
be too general), but it does cause the reported query to run without error, so at the very
least it demonstrates the problem.  There could very well be a better way to fix it...

> NPE on a query with having clause involving a join
> --------------------------------------------------
>                 Key: DERBY-3880
>                 URL: https://issues.apache.org/jira/browse/DERBY-3880
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>         Environment: Windows 2003 Server 
>            Reporter: Venkateswaran Iyer
>            Priority: Minor
>         Attachments: AggregateExpressionResultColumn.txt, querytree_fail.txt, querytree_works.txt
> A simple query involving a join and having clause causes a NPE. Any subsequent executions
cause severe errors. It almost looks like the underlying connection was closed out.
> ====
> C:\apps\derby\db-derby-\db-derby-\bin>ij
> ij version 10.4
> ij> connect 'jdbc:derby://speed:1527/ClassicModels;user=sa;password=sa
> ';
> ij> create table t1(i int, c varchar(20));
> 0 rows inserted/updated/deleted
> ij> create table t2(i int, c2 varchar(20), i2 int);
> 0 rows inserted/updated/deleted
> ij> insert into t1 values(1, 'abc');
> 1 row inserted/updated/deleted
> ij> insert into t1 values(2, 'abc');
> 1 row inserted/updated/deleted
> ij> insert into t2 values(1, 'xyz', 10);
> 1 row inserted/updated/deleted
> ij> insert into t2 values(1, 'aaa', 20);
> 1 row inserted/updated/deleted
> ij> insert into t2 values(2, 'xxx', 30);
> 1 row inserted/updated/deleted
> ij> select t1.i, avg(t2.i2) from t1 inner join t2 on (t1.i = t2.i) group by t1.i
>  having avg(t2.i2) > 0;
> ERROR XJ001: DERBY SQL error: SQLCODE: -1, SQLSTATE: XJ001, SQLERRMC: java.lang.
> NullPointerException¶¶XJ001.U

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message