db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mamta Satoor <msat...@gmail.com>
Subject Re: [PATCH] (DERBY-251) DISTINCT query is returning duplicate rows
Date Tue, 10 May 2005 14:40:13 GMT
Can someone please review this for me?

thanks,
Mamta

On 5/5/05, Mamta Satoor <msatoor@gmail.com> wrote:
> Hi,
> 
> I have a patch for this optimizer bug. Basically, the issue turned out
> to be the logic for DISTINCT elimination. During the optimization
> phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
> class's returnsAtMostSingleRow() method gets called. This method
> returns true if the method concludes that DISTINCT in the query is
> redundant (based on a complex logic that decides that the query is
> going to return distinct rows on its own without the DISTINCT clause.
> The details of the current logic for DISTINCT elimination can be found
> in the comments at the method level.)
> 
> For the query in question in this bug, the method returned true for
> DISTINCT elimination which is wrong. The explanation is as follows.
> 
> First of all, I was able to simplify the query reported in the bug to
> following query.
> select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  ( q1."NO1" <> ALL
> (select  q3."NO1" from IDEPT q3 where  (q3."REPORTTO_NO" =  q2."NO1")))
> 
> This query gets converted to following during optimization
> select  distinct  q1."NO1" from IDEPT q1, IDEPT q2
> where  ( q2."DISCRIM_DEPT" = 'HardwareDept')
> and  ( q1."DISCRIM_DEPT" = 'SoftwareDept')  and  not exists (
> (select  q3."NO1" from IDEPT q3 where
> (  q3."REPORTTO_NO" =  q2."NO1"  and q3."NO1" = q1."NO1") ) )  ;
> 
> This optimized query has 4 predicates associated with it
> q3.reportto_no = q2.no1
> q2.discrim_dept = 'HardwareDept'
> q1.descrim_dept = 'SoftwareDept'
> q1.no1 = q3.no1
> 
> Next, on this optimized query(since it has DISTINCT clause in it), the
> returnsAtMostSingleRow() method gets called. The method incorrectly
> returns true indicating that DISTINCT can be eliminated. The reason
> for this is that method is looking at predicates that belong to the
> inside query with the exists clause (which is on table IDEPT q3) to
> determine DISTINCT elimination for the outer level.
> 
> The fix is that the predicates from the exists query, (in this
> particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
> should not be considered when deciding elimination of DISTINCT in the
> outer query. That is what the attached patch does.
> 
> Hope this helps understand the problem and the proposed fix for it.
> The files impacted by the change are as follows
> svn stat
> M java\engine\org\apache\derby\impl\sql\compile\FromList.java
> M java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
> M java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
> 
> Please send in comments you may have. I have run the existing tests
> and the patch didn't cause any failures.
> 
> thanks,
> Mamta
> 
> 
>

Mime
View raw message