db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Klebanoff <klebanoff-de...@sbcglobal.net>
Subject Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows
Date Wed, 11 May 2005 18:51:33 GMT
The patch does not completely fix the problem. It does not handle the 
case where the exists table column is embedded in an expression. Try the 
following variation on the test select:

select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
(select  q3."NO1" from IDEPT q3 where  ( ABS(q3."REPORTTO_NO") =  q2."NO1")))

Because q3."REPORTTO_NO" is inside a call to ABS the code added to 
FromList.returnsAtMostSingleRow does not see it.

I would suggest using

                        JBitSet referencedTables = 
and.getLeftOperand().getTablesReferenced();
                        if( referencedTables.get( existsTableNumber))
                        {
                            
predicatesTemp.removeElementAt(predicatesTempIndex);
                            break;
                        }

instead of
                        BinaryRelationalOperatorNode beon = 
(BinaryRelationalOperatorNode)
                            and.getLeftOperand();
                        ValueNode left = beon.getLeftOperand();
                        ValueNode right = beon.getRightOperand();

                        /* If left or right side of predicate refer to 
exists base table,
                        then remove it */
                        if ((left instanceof ColumnReference) &&
                            ((ColumnReference) left).getTableNumber() == 
existsTableNumber)
                        {
                            
predicatesTemp.removeElementAt(predicatesTempIndex);
                            break;
                        }
                        else if ((right instanceof ColumnReference) &&
                            ((ColumnReference) right).getTableNumber() 
== existsTableNumber)
                        {
                            
predicatesTemp.removeElementAt(predicatesTempIndex);
                            break;
                        }

I have tried it out and it seems to work.

Jack Klebanoff

Mamta Satoor wrote:

>Hi,
>
>I have a patch for this optimizer bug. Basically, the issue turned out
>to be the logic for DISTINCT elimination. During the optimization
>phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
>class's returnsAtMostSingleRow() method gets called. This method
>returns true if the method concludes that DISTINCT in the query is
>redundant (based on a complex logic that decides that the query is
>going to return distinct rows on its own without the DISTINCT clause.
>The details of the current logic for DISTINCT elimination can be found
>in the comments at the method level.)
>
>For the query in question in this bug, the method returned true for
>DISTINCT elimination which is wrong. The explanation is as follows.
>
>First of all, I was able to simplify the query reported in the bug to
>following query.
>select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
>where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
>and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
>(select  q3."NO1" from IDEPT q3 where  (q3."REPORTTO_NO" =  q2."NO1")))
>
>This query gets converted to following during optimization
>select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
>where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
>and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  not exists (
>(select  q3."NO1" from IDEPT q3 where
>(  q3."REPORTTO_NO" =  q2."NO1"  and q3."NO1" = q1."NO1") ) )  ;
>
>This optimized query has 4 predicates associated with it
>q3.reportto_no = q2.no1
>q2.discrim_dept = 'HardwareDept'
>q1.descrim_dept = 'SoftwareDept'
>q1.no1 = q3.no1
>
>Next, on this optimized query(since it has DISTINCT clause in it), the
>returnsAtMostSingleRow() method gets called. The method incorrectly
>returns true indicating that DISTINCT can be eliminated. The reason
>for this is that method is looking at predicates that belong to the
>inside query with the exists clause (which is on table IDEPT q3) to
>determine DISTINCT elimination for the outer level.
>
>The fix is that the predicates from the exists query, (in this
>particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
>should not be considered when deciding elimination of DISTINCT in the
>outer query. That is what the attached patch does.
>
>Hope this helps understand the problem and the proposed fix for it.
>The files impacted by the change are as follows
>svn stat
>M java\engine\org\apache\derby\impl\sql\compile\FromList.java
>M java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
>M java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
>
>Please send in comments you may have. I have run the existing tests
>and the patch didn't cause any failures.
>
>thanks,
>Mamta
>  
>


Mime
View raw message