db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3257) SELECT with HAVING clause containing OR conditional incorrectly return 1 row - should return 2 rows - works correctly with 10.2 DB
Date Fri, 18 Jan 2008 18:39:34 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560494#action_12560494
] 

A B commented on DERBY-3257:
----------------------------

> I think I need a better understanding of why it is necessary for the havingClause
> to be in CNF before the call to preprocess

This is a good question--and perhaps in the long run this is not requirement per se.  The
only reason I thought to normalize the havingClause is because the OrNode.preprocess() method
expects that the OrNode has been normalized.  I.e. it expects that the predicate:

  [not normalized] =>  (t0.CODE = 'GBR' OR t0.CODE = 'CHA')

becomes

  [normalized] => (t0.CODE = 'GBR' OR ((t0.CODE = 'CHA') OR FALSE))

When it transforms the OR list into an IN list, then, it just walks the tree and grabs the
left operand from all of the chained ORs.  So in the normalized case we grab:

  1. t0.CODE = 'GBR'
  2. t0.CODE = 'CHA'

>From those two predicates we then get "t0.CODE in ('GBR', 'CHA')", which is correct. 
If the HAVING clause is not normalized, though, then the logic in OrNode will only grab:

  1. t0.CODE = 'GBR'

because there is only one OrNode and that's its left operand.   So we end up with "t0.CODE
in ('GBR')" and thus we miss the row for 'CHA'.

All of that said, maybe normalizing the HAVING clause is not the best solution, or perhaps
it goes too far?  Maybe OrNode should be changed to recognize if it is NON-normalized and
should not try to transform itself in that case?  If that's true, the follow-up question would
be: Are there other places in the code that expect the having clause to be normalized, as
well?

It seems to me that something in DERBY-681 has exposed the HAVING clause to processing that
was previously only applied to WHERE clauses.  If that's true, and if the change was intentional,
then the HAVING clause should (in theory) have to go through the same pre-processing steps
as the WHERE clause--which includes normalization.  If that change in behavior was not intentional,
though, then maybe the fix for this issue is to somehow prevent WHERE-clause -specific processing
from happening on the HAVING clause.  I don't know for sure, but that last option sounds pretty
hard...

> is it a problem to avoid [normalization] for SubqueryNodes?

I would follow this up with two other questions:

  1. What it is about normalization that causes problems for the SubqueryNode?  It looks like
the 42X24 error comes because the result column referenced within the subquery has a "source
level" that is the same as the outer query--and that is (apparently?) because the SuqbueryNode
somehow "disappears" as a result of the normalization.  That suggests that perhaps the Subquery
is being flattened during, or as a result of, normalization.  Without normalization, the SubqueryNode
hangs around and its result column gets a source level that is different from the outer query,
so the check in GroupByNode.addNewColumnsForAggregation() passes.

  2. What happens if the same nested query appears in a WHERE clause and the WHERE clause
is normalized?  Will the subquery be flattened during or after normalization?

These two questions don't answer yours, but perhaps some investigation of them will lead to
an answer.  If the answer is "Yes, it's okay to avoid normalization for SubqueryNodes", then
that leads to yet another question: are there other nodes for which it is okay to avoid normalization,
as well?

> SELECT with HAVING clause containing OR conditional incorrectly return 1 row - should
return 2 rows - works correctly with 10.2 DB
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3257
>                 URL: https://issues.apache.org/jira/browse/DERBY-3257
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.3.2.1, 10.4.0.0
>            Reporter: Stan Bradbury
>            Assignee: Kathey Marsden
>         Attachments: 42X24_error.sql, d3257_doNOTCommit.patch, derby-3257_plan_10.2.txt,
derby-3257_plan_10.4.txt, TestHaving.java
>
>
> Attached program demonstrates the problem.  Only one count is returned (matching CODE=
GBR) - the count of CODE=CHA is not returned.  Works fine with versions 10.1 and 10.2 or if
program is run using 10.3 jars and 10.2 database (soft upgrade).
> Query:
> SELECT COUNT(t0.ID) FROM CTS1.TEST_TABLE t0 
>   GROUP BY t0.CODE 
>     HAVING (t0.CODE = 'GBR' OR t0.CODE = 'CHA') AND t0.CODE IS NOT NULL
> Incorrect results (see last line):
> Database product: Apache Derby
> Database version: 10.3.1.5 - (579866)
> Driver name:      Apache Derby Embedded JDBC Driver
> Driver version:   10.3.1.5 - (579866)
> result: 2
> Correct results:
> Database product: Apache Derby
> Database version: 10.2.2.0 - (485682)
> Driver name:      Apache Derby Embedded JDBC Driver
> Driver version:   10.2.2.0 - (485682)
> result: 4
> result: 2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message