db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4071) AssertFailure when selecting rows from a table with CHARACTER and VARCHAR columns
Date Tue, 10 Mar 2009 05:52:50 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dag H. Wanvik updated DERBY-4071:
---------------------------------

    Attachment: derby-4071.stat
                derby-4071.diff

I upload a patch, derby-4071, which solves the immediate problem, I
think. Running regressions now.

It contains a patch to GroupByNode as well as a new test case for
GroupByTest.

It essentially defers the replacement of group by expressions in the
having clause with pointers to the appropriate result column in the
group by node. The replacement used to happen at the end of
GroupByNode#addUnAggColumns. The patch moves substitution to after the
call to GroupByNode#addAggregateColumns has been performed. 

Explanation of the error
-----------------------

The substitution is done by a SubstituteExpressionVisitor, which
replaces all occurences of the group by expression as described
above. In our case, however, this group by expression is an argument
to an aggregate function, so the having clause contains an
AggregateNode whose operand is the group by column. When the visitor
gets there, the aggregate node's operand is replaced as described,
cf. UnaryOperatorNode#accept (AggregateNode is a subclass).

There is a snag, however. This AggregateNode is aliased by the
aggregateNode we find in GroupByNode#aggregateVector. And we are not
done with using the information in aggregateVector yet when
addUnAggColumns has run.

Notably, in addAggregateColumns, the information on the aggregates are
needed, and the substitution described above gets in the way:

When constructing the aggregate expression (input) columns, there is a
call to 

     aggregate.getNewExpressionResultColumn(dd)

This uses the operand field of the AggregateNode, whose value has just
been replaced for the purposes of the having clause. So, we end up
with a result column for aggregate expression which is wrong. The 
result column list of GroupByNode ends up looking like this:

RCL (result column list) 
   [0]: #UnaggColumn (the group by column)
      \
       CR
        \
        RC
          \
          VCN
            \
            RC (basetable)
           

   [1]: ##aggregate result
   [2]: ##aggregate expression
      \
      CR
       \
       RC
         \
         VCN
           \
           RCL[0] above, *error* That is, the RC of the group by node!

   [3]: aggregator


If things were OK we would expect to see:


RCL
   [0]: #UnaggColumn (the group by column)
       \
        :
   [1]: ##aggregate result
   [2]: ##aggregate expression
      \
      CR
       \
       RC (underlying ProjectRestrict)
         \
         CR         
           \
           RC (bottom ProjectRestrict)
            \
            VCN 
              \
              RC (basetable)
   [3]: aggregator

In the the underlying ProjectRestrictNode, which needs to set up the
mapArray to locate the correct column in the underlying base
table, this creates havoc:

The underlying PRN calls RCL.mapSourceColumns for every RC in the
ProjectRestrictNode's RCL (strip off a CR-RC level from GroupByNode's
RCL to picture ProjectRestrictNode's RCL).

mapSourceColumns extracts the virtual column id from a CR or a VCN. For
column 2 of the underlying ProjectRestrictNode, it sees

      [2]:
         \
         VCN
           \
           RCL[0]

and there it finds the virtual column number of 1, where we should
have seen:

      [2]:
         \
         CR         
           \
           RC
            \
            VCN
              \
              RC (basetable)

and found the virtual column number 2. 

With the patch, the RCLs end up as expected and the repro works.

Since the substitution "damages" the AggregateNode in the
aggregateVector, it struck me that any later usage might also be
affected, even with the patch. The aggregateVector is actually being
used again later, in considerPostOptimizeOptimizations. However, that
code only runs if there is no explicit group by, and max 1 aggregate
function which must be max/min, so I am not sure if this would ever be
an issue. And if so, it would only bar an optimization, not give a
wrong result...


> AssertFailure when selecting rows from a table with CHARACTER and VARCHAR columns
> ---------------------------------------------------------------------------------
>
>                 Key: DERBY-4071
>                 URL: https://issues.apache.org/jira/browse/DERBY-4071
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Aaron Digulla
>            Assignee: Dag H. Wanvik
>         Attachments: derby-4071.diff, derby-4071.stat, DerbyTest.java, trialPatch.diff
>
>
> When running a complex query on this table:
> [code]
> Create table DEMO.TEST (
>     CHR                            CHARACTER(26)                   ,
>     VCHR                           VARCHAR(25)                     )
> [code]
> then I get this exception:
> AssertFailure: ASSERT FAILED col1.getClass() (class ...SQLChar) expected to be the same
as col2.getClass() (class ....SQLVarchar)' was thrown while evaluating an expression.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message