db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dag H. Wanvik (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4471) Left outer join reassociation rewrite gives wrong result
Date Sat, 17 Jul 2010 22:26:49 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dag H. Wanvik updated DERBY-4471:
---------------------------------

    Attachment: derby-4471-1c.diff
                derby-4471-1c.stat

Actually, the my first patch was both too liberal (due to bug) and too
restrictive (design). I instrumented the code to show when reorderings
took place before and after.

Uploading version 1c of this patch. It now allows all correct
reorderings that the old implementation did, but still fixes the wrong
ones seen in this issue. In addition, it opens up for reordering in a
few cases where the old implementation did not, by improving the
analysis for null rejecting join predicates, cf. the equation
mentioned in the paper by Galindo-Legaria et al:
    
  Prs     Pst            Prs     Pst
R ---> (S ---> T)  == (R ---> S) ---> T

where ---> is (left) outer join, Pxy is a null rejecting join predicate.

The old implementation, as well as my previous version of this patch,
required equality operators, but the new implementation allows for the
other relational operators as well that also deny nulls, of this form
(the join predicate is in CNF at this time in processing):

       X relop Y [and .. and X-n relop Y-n]

where at least one relop refers to columns from both R,S or S,T. 

Note that the predicate analysis could be liberalized even more at the
cost of a more sophisticated analysis, e.g. "R.x1 = S.y1 OR R.x2 = S.y2"
would be safe, but disjunction (OR) is currently not accepted.

The major problem with the old implementation was that, although it
did check that Prs did not reference T, it did not assert Pst.

New test cases have been added to OuterJoinTest that pass with the new
implementation, but gave wrong results with the old, cf. fixtures
OuterJoinTest#testDerby_4471a and OuterJoinTest#testDerby_4471b().

The old style test lojreorder.sql has been converted to the JUnit test
LojReorderTest. The test cases that now reorder and did not before,
are annotated in LojReorderTest, look for the string 4471, or cf the
list below.
     
I added a couple of new methods to be able to assert contents in the
execution plan, to make it convenient to verify that the reorderings
actually took place, cf. JDBC.checkPlan and
RuntimeStatisticsParser#assertSequence.

Since HalfOuterJoinNode#LOJ_reorderable has largely been rewritten, it
would probably be best to review the changes by looking at the old and
the new versions in full. The  predicate checking has been factored
out in the method isNullRejecting.

[Appendix]
New reorderings from lojreorder.sql. All compute correctly as before.

After conversion to JUnit, I have added assertions on the query plans to
show that these reordering now actually take place.

- select * from t left outer join (s left outer join r on (f = i)) on (b = e and a > b)

- select * from t left outer join (s left outer join r on (f = i)) on (b = e and a = b);

- select * from t left outer join (s left outer join r on (f = i)) on (b = e and 1 = 1)

- select * from t left outer join (s left outer join r on (f = i)) on (b > e)

- Makes case 1.13 work: the test says its reordered, but actually it
  is not (the "and 1=1" threw old impl off).

- Case 1.15 now reorders ("on b1>c1" is ok) - Note to self: update comment.

- Makes case 1.31 work, now reorders twice, earlier none ("on a1=b1 and b2=2" 
  threw old impl off).


> Left outer join reassociation rewrite gives wrong result
> --------------------------------------------------------
>
>                 Key: DERBY-4471
>                 URL: https://issues.apache.org/jira/browse/DERBY-4471
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.1.2.1, 10.1.3.1, 10.2.1.6, 10.2.2.0,
10.3.1.4, 10.3.2.1, 10.3.3.0, 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0
>            Reporter: Dag H. Wanvik
>            Assignee: Dag H. Wanvik
>         Attachments: derby-4471-1a.diff, derby-4471-1a.stat, derby-4471-1b.diff, derby-4471-1b.stat,
derby-4471-1c.diff, derby-4471-1c.stat, derby-4471-junit-repro.diff, query_plan_derby_4471.pdf
>
>
> The following script and output shows the problem:
> > create table r(c1 char(1));
> > create table s(c1 char(1), c2 char(1));
> > create table t(c1 char(1));
> > insert into r values 'a';
> > insert into s values ('b', default);
> > insert into t values ('c');
> > select * from s left outer join t on s.c2=t.c1 or s.c2 is null;
> C1  |C2  |C1  
> --------------
> b   |NULL|c   
> > select * from r left outer join s on r.c1=s.c1;
> C1  |C1  |C2  
> --------------
> a   |NULL|NULL
> > select * from (r left outer join s on r.c1=s.c1) left outer join t on s.c2=t.c1
or s.c2 is null;
> C1  |C1  |C2  |C1  
> -------------------
> a   |NULL|NULL|c   
> > select * from r left outer join (s left outer join t on s.c2=t.c1 or s.c2 is null)
on r.c1=s.c1;
> C1  |C1  |C2  |C1  
> -------------------
> a   |NULL|NULL|c   
> The last result is wrong. The correct answer should be:
> C1  |C1  |C2  |C1  
> -------------------
> a   |NULL|NULL|NULL   
> since in the last form, the left table r has the value 'a', which does
> not match any row in result of the compound inner given the join
> predicate ("r.c1=s.c1"), so all nulls should be appended to the 'a'
> from the outer table r.
> This happens because internally the last form is rewritten to the
> second but the last form (left-deep), but this rewrite is not
> justified here unless the join predicate on s rejects null, which the
> present one explicitly does not ("or s.c2 is null"). Cf. for example
> [1], page 52, which describes this transform and its prerequisite
> condition as indentity #7.
> [1] Galindo-Legaria, C. & Rosenthal, A.: "Outerjoin simplification and
> reordering for query optimization", ACM Transactions on Database
> Systems, Vol 22, No 1, March 1997.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message