Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 19548 invoked from network); 18 Jul 2006 20:24:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Jul 2006 20:24:30 -0000 Received: (qmail 99583 invoked by uid 500); 18 Jul 2006 20:24:28 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 99545 invoked by uid 500); 18 Jul 2006 20:24:28 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 99517 invoked by uid 99); 18 Jul 2006 20:24:28 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jul 2006 13:24:28 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jul 2006 13:24:27 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 294A641001E for ; Tue, 18 Jul 2006 20:22:19 +0000 (GMT) Message-ID: <21037857.1153254139166.JavaMail.jira@brutus> Date: Tue, 18 Jul 2006 13:22:19 -0700 (PDT) From: "A B (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Updated: (DERBY-781) Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times. In-Reply-To: <52402641.1135307852215.JavaMail.jira@ajax.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/DERBY-781?page=all ] A B updated DERBY-781: ---------------------- Derby Info: [Patch Available, Release Note Needed] (was: [Patch Available]) Possible RELEASE NOTE for this fix is as follows, based on suggestions from Bryan in the above-referenced thread: DERBY-781: Materialize subqueries in select list where possible to avoid creating invariant resultsets many times. The Derby optimizer has been enhanced so that it now considers the cost of performing a hash join with subqueries when it is safe to do so. If the cost of the hash join is better than a nested loop join, Derby will choose to do the hash join and will thereby materialize the subquery. WHAT CHANGED When optimizing a query that has one or more non-flattenable subqueries in the FROM clause, Derby will now check to see if it is possible to perform a hash join with that subquery as the inner table. Prior to Derby 10.2, the optimizer would never consider a hash join with a subquery; it only did nested loop joins. SYMPTOM Execution performance of queries containing non-flattenable subqueries may change. The expectation is that the new (10.2) query plans will show improved performance over the old ones. Another potential symptom is that the compilation time for such queries may increase. If this happens, the increase should only occur at compilation time; execution time should either improve or, at the very least, remain the same as in earlier versions of Derby. CAUSE If the optimizer chooses to do a hash join with a subquery, Derby only has to execute the subquery a single time per statement, after which Derby can just perform the desired join against the materialized result set. Depending on how many rows are in the outer table of the join, this once-per-statement execution of the subquery can lead to major performance improvements over the once-per-outer-row execution employed by earlier versions of Derby. As for the extra compilation time, this is due to the simple fact that the optimizer is now doing more work--i.e. in addition to considering nested loop joins with subqueries, it is now _also_ considering hash joins with those subqueries, and that means that it could potentially take longer for the optimizer to finish its work. Note again that, if it occurs, the increased time should only occur at compilation time; execution time should either improve or, at the very least, remain the same as in earlier versions of Derby. SOLUTION This was an intentional change to improve the execution plans chosen by the optimizer for queries having large and/or complex subqueries. The expectation is that the new behavior--and the subsequent query plans--will lead to improved performance over the old ones, so no further solution is required. WORKAROUND There is no way to disable/workaround this new behavior since the symptom as described above is a good one for Derby. That said, any user who notices a negative performance change after moving to Derby 10.2, and who believes that the difference in performance is related to this optimizer enhancement, is encouraged to visit the following "performance diagnosis" page and to follow up with his/her findings on the Derby mailing lists: http://wiki.apache.org/db-derby/PerformanceDiagnosisTips > Materialize union subqueries in select list where possible to avoid creating invariant resultsets many times. > ------------------------------------------------------------------------------------------------------------- > > Key: DERBY-781 > URL: http://issues.apache.org/jira/browse/DERBY-781 > Project: Derby > Issue Type: Improvement > Components: SQL > Affects Versions: 10.1.1.0, 10.2.0.0 > Environment: generic > Reporter: Satheesh Bandaram > Assigned To: A B > Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, DERBY-781_v1.html > > > Derby's handling of union subqueries in from list can be improved by materializing invariant resultsets once, rather than creating them many times. > For example: > create view V1 as select i, j from T1 union select i,j from T2; > create view V2 as select a,b from T3 union select a,b from T4; > insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5); > For a query like select * from V1, V2 where V1.j = V2.b and V1.i in (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. (assuming V2 is choosen as the the inner table) This can be very costly if the underlying selects can take long time and also may perform union many times. > Enhance materialization logic in setOperatorNode.java. It currently returns FALSE always. > public boolean performMaterialization(JBitSet outerTables) > throws StandardException > { > // RESOLVE - just say no to materialization right now - should be a cost based decision > return false; > /* Actual materialization, if appropriate, will be placed by our parent PRN. > * This is because PRN might have a join condition to apply. (Materialization > * can only occur before that. > */ > //return true; > } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira