db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mamta Satoor <msat...@gmail.com>
Subject Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows
Date Wed, 11 May 2005 20:01:06 GMT
Hi Jack,

Appreciate you taking the time to do the review and catching the
predicate with expression. I have changed the code and also added a
test case for it. I have fired the derbyall suite on my codeline to
make sure everything else runs smoothly. Attached is the updated patch
anyways.

thanks,
Mamta

On 5/11/05, Jack Klebanoff <klebanoff-derby@sbcglobal.net> wrote:
> The patch does not completely fix the problem. It does not handle the
> case where the exists table column is embedded in an expression. Try the
> following variation on the test select:
> 
> select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
> (select  q3."NO1" from IDEPT q3 where  ( ABS(q3."REPORTTO_NO") =  q2."NO1")))
> 
> Because q3."REPORTTO_NO" is inside a call to ABS the code added to
> FromList.returnsAtMostSingleRow does not see it.
> 
> I would suggest using
> 
>                        JBitSet referencedTables =
> and.getLeftOperand().getTablesReferenced();
>                        if( referencedTables.get( existsTableNumber))
>                        {
> 
> predicatesTemp.removeElementAt(predicatesTempIndex);
>                            break;
>                        }
> 
> instead of
>                        BinaryRelationalOperatorNode beon =
> (BinaryRelationalOperatorNode)
>                            and.getLeftOperand();
>                        ValueNode left = beon.getLeftOperand();
>                        ValueNode right = beon.getRightOperand();
> 
>                        /* If left or right side of predicate refer to
> exists base table,
>                        then remove it */
>                        if ((left instanceof ColumnReference) &&
>                            ((ColumnReference) left).getTableNumber() ==
> existsTableNumber)
>                        {
> 
> predicatesTemp.removeElementAt(predicatesTempIndex);
>                            break;
>                        }
>                        else if ((right instanceof ColumnReference) &&
>                            ((ColumnReference) right).getTableNumber()
> == existsTableNumber)
>                        {
> 
> predicatesTemp.removeElementAt(predicatesTempIndex);
>                            break;
>                        }
> 
> I have tried it out and it seems to work.
> 
> Jack Klebanoff
> 
> Mamta Satoor wrote:
> 
> >Hi,
> >
> >I have a patch for this optimizer bug. Basically, the issue turned out
> >to be the logic for DISTINCT elimination. During the optimization
> >phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
> >class's returnsAtMostSingleRow() method gets called. This method
> >returns true if the method concludes that DISTINCT in the query is
> >redundant (based on a complex logic that decides that the query is
> >going to return distinct rows on its own without the DISTINCT clause.
> >The details of the current logic for DISTINCT elimination can be found
> >in the comments at the method level.)
> >
> >For the query in question in this bug, the method returned true for
> >DISTINCT elimination which is wrong. The explanation is as follows.
> >
> >First of all, I was able to simplify the query reported in the bug to
> >following query.
> >select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> >where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> >and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
> >(select  q3."NO1" from IDEPT q3 where  (q3."REPORTTO_NO" =  q2."NO1")))
> >
> >This query gets converted to following during optimization
> >select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> >where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> >and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  not exists (
> >(select  q3."NO1" from IDEPT q3 where
> >(  q3."REPORTTO_NO" =  q2."NO1"  and q3."NO1" = q1."NO1") ) )  ;
> >
> >This optimized query has 4 predicates associated with it
> >q3.reportto_no = q2.no1
> >q2.discrim_dept = 'HardwareDept'
> >q1.descrim_dept = 'SoftwareDept'
> >q1.no1 = q3.no1
> >
> >Next, on this optimized query(since it has DISTINCT clause in it), the
> >returnsAtMostSingleRow() method gets called. The method incorrectly
> >returns true indicating that DISTINCT can be eliminated. The reason
> >for this is that method is looking at predicates that belong to the
> >inside query with the exists clause (which is on table IDEPT q3) to
> >determine DISTINCT elimination for the outer level.
> >
> >The fix is that the predicates from the exists query, (in this
> >particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
> >should not be considered when deciding elimination of DISTINCT in the
> >outer query. That is what the attached patch does.
> >
> >Hope this helps understand the problem and the proposed fix for it.
> >The files impacted by the change are as follows
> >svn stat
> >M java\engine\org\apache\derby\impl\sql\compile\FromList.java
> >M java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
> >M java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
> >
> >Please send in comments you may have. I have run the existing tests
> >and the patch didn't cause any failures.
> >
> >thanks,
> >Mamta
> >
> >
> 
>

Mime
View raw message