db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3880) NPE on a query with having clause involving a join
Date Wed, 12 Nov 2008 01:47:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646775#action_12646775

A B commented on DERBY-3880:

Hi Bryan,

Thanks for your interest in this issue.

> Army says "so the JoinNode's bind-time RCL is: { T1.i, T1.c, T2.i, T2.c2, T2.i2 }" but
I don't
> see how that is [...] I'm having trouble seeing why T1.c and T2.c2 are in the join node's

I think the key observation here is in the sentence immediately preceding the one that you
quoted, namely:

  The JoinNode's bind-time result column list is simply a concatenation of the result column
  lists from its operands.

Due to the specification of "inner join" the parser will create a parse-time JoinNode whose
left operand is a FromBaseTable representing T1 and whose right operand is a FromBaseTable
representing T2.  Note how neither operand has a ProjectRestrictNode above it, which in turn
means we will *not* project out any columns (at this point).  Thus the FromBaseTable for T1
has an RCL with *ALL* of T1's columns in it, and the FromBaseTable for T2 has an RCL with
*ALL* of T2's columns in it.  The fact that some columns are never referenced in the query
has no effect on the RCLs for the FromBaseTables at this point in binding.

To verify, you can put a breakpoint in JoinNode.buildRCL() and, when the breakpoint hits,
look at the leftOperand and rightOperand.  They will both be FromBaseTables and the RCLs will
have all columns that exist in the underlying base tables.  So the JoinNode's RCL then becomes
a concatenation of all columns from T1 and all columns from T2, hence { T1.i, T1.c, T2.i,
T2.c2, T2.i2 }--even though T1.c and T2.c2 are never referenced in the query.

> I anticipated that I was going to see a fix which involved passing the HAVING clause
> expressions to fromList.flattenFromTables() so that it could properly maintain the
> ColumnReference data in the HAVING clause during flattening. 

To be honest, I think you're absolutely right: that seems like a more robust approach to the
problem.  See below...

> if it's safe to allow the ColumnReference in the aggregate of a HAVING clause
> to "decay" (become in-accurate) during optimization, and then fix it up at the end,
> why do we have to re-map the ColumnReference information for the SELECT's
> RCL?

Good question.  I do not think it is generally safe to assume that a temporarily inaccurate
ColumnReference is going to be okay.  The vast majority of the ColumnReference remapping that
occurs during optimization happens in order to support the pushing of predicates down the
tree and/or to allow the insertion of query tree nodes into an existing tree.  In both cases
the column reference is being remapped from one valid ResultColumn to another valid ResultColumn.
 In other words, we can't just look at the ColumnReference after optimization has finished
and say "pick the only one that's valid", because there could be an arbitrarily long chain
of ResultColumns, VirtualColumnNodes, and ColumnReferences, all of which are valid and referenceable
by different parts of the query tree.  The only way to know which one is the "right" one for
a given ColumnReference is to remap (or un-remap) in "real time"--ex. when we push/pull a
predicate, we have to remap its column reference as part of the push/pull.

That said, the fact that a post-optimization remapping of the HAVING aggregate's ColumnReference
corrected the problem was a pleasant surprise to me, as I too was more inclined to believe
that, by then, it would be too late.  But I made the change, saw that it made sense and that
it fixed the immediate problem, and that's what I posted.  Having done so, I think it's worth
it to echo the final few sentences from that same comment, namely:

  I admit that this change may not be sufficient (or it may be too general), but it does
  cause the reported query to run without error, so at the very least it demonstrates
  the problem. There could very well be a better way to fix it...

I posted a quick fix to demonstrate the problem and stopped there.  Kathey was kind enough
to pick it up and take it to completion.  But at a higher level, I think you're right: there's
probably a better (and safer) way to fix this bug, and your suggestion that such a fix would
involve passing the HAVING clause expressions to fromList.flattenFromTables() seems like a
good one to me...

> I've noticed that (I think) the actual code that is generated to execute the query may
> contain multiple computations of the same expressions. That is, when this query
> runs, is it true that we are actually going to *compute* AVG(t2.i) twice?

I have not looked at this in any detail, but I believe the answer is "Yes", at least based
on the tracing I was doing when I was trying to track down the cause of this issue.  But I
wasn't specifically investigating that particular behavior at the time, so it's possible I
missed something...

> NPE on a query with having clause involving a join
> --------------------------------------------------
>                 Key: DERBY-3880
>                 URL: https://issues.apache.org/jira/browse/DERBY-3880
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>         Environment: Windows 2003 Server 
>            Reporter: Venkateswaran Iyer
>            Assignee: A B
>            Priority: Minor
>             Fix For:,,
>         Attachments: AggregateExpressionResultColumn.txt, querytree_fail.txt, querytree_works.txt
> A simple query involving a join and having clause causes a NPE. Any subsequent executions
cause severe errors. It almost looks like the underlying connection was closed out.
> ====
> C:\apps\derby\db-derby-\db-derby-\bin>ij
> ij version 10.4
> ij> connect 'jdbc:derby://speed:1527/ClassicModels;user=sa;password=sa
> ';
> ij> create table t1(i int, c varchar(20));
> 0 rows inserted/updated/deleted
> ij> create table t2(i int, c2 varchar(20), i2 int);
> 0 rows inserted/updated/deleted
> ij> insert into t1 values(1, 'abc');
> 1 row inserted/updated/deleted
> ij> insert into t1 values(2, 'abc');
> 1 row inserted/updated/deleted
> ij> insert into t2 values(1, 'xyz', 10);
> 1 row inserted/updated/deleted
> ij> insert into t2 values(1, 'aaa', 20);
> 1 row inserted/updated/deleted
> ij> insert into t2 values(2, 'xxx', 30);
> 1 row inserted/updated/deleted
> ij> select t1.i, avg(t2.i2) from t1 inner join t2 on (t1.i = t2.i) group by t1.i
>  having avg(t2.i2) > 0;
> ERROR XJ001: DERBY SQL error: SQLCODE: -1, SQLSTATE: XJ001, SQLERRMC: java.lang.
> NullPointerException¶¶XJ001.U

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message