db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Army <qoz...@gmail.com>
Subject Re: Question regarding runtimestatistics and join order
Date Thu, 02 Oct 2008 17:48:03 GMT
Kathey Marsden wrote:
>
> As for how to incorporate this into RuntimeStatisticsParser, the only 
> thing I can think of is to add an boolean orderedSearchStrings(String[] 
> searchStrings) method to RuntimeStatistics parser that will search for 
> the specified strings in order and return true if they are there in the 
> order they are in the array.

I think this works so long as:

   1) the table names in the query do not appear elsewhere in the query 
plan (ex. a table name of "T" would match the first letter of the word 
"Table" in "Hash Table ResultSet", which we wouldn't want), and

   2) the argument array passed to the new function includes *ALL* 
tables in the query, not just a subset.

With respect to #2, if my query is of the form:

   select ... from
     (select ... from t2, t1, t3 where ...) X1
     (select ... from t1, t2 where ...) X2
   where ...

Assume a test wants to verify that the tables in subquery X2 have a join 
order of { T2, T1 }, but doesn't really care about the join order of the 
subquery in X1, nor does it care about the order of X1 w.r.t. X2.  You'd 
*still* have to make sure that the array passed into the ordered search 
method includes the join order for X1, as well, otherwise the test might 
incorrectly pass.

For example, if we only check for the join order of the "targeted" 
subquery X2, meaning we pass ["T2", "T1"] into the proposed method and 
ignore X1 altogether, then the test would IN-correctly PASS for the 
following query plan:

   ProjectRestrict:
   +++ JoinNode_0:
   ++++++ LeftResultSet:                  <== This corresponds to X1
   +++++++++ JoinNode_1:
   ++++++++++++ LeftResultSet:
   +++++++++++++++ JoinNode_2:
   ++++++++++++++++++ LeftResultSet: T3
   ++++++++++++++++++ RightResultSet: T2
   ++++++++++++ RightResultSet: T1
   ++++++ RightResultSet:                 <== This corresponds to X2
   +++++++++ JoinNode_3:
   ++++++++++++ LeftResultSet: T1
   ++++++++++++ RightResultSet: T2


If you just search for "T1" followed by "T2", the test will pass because 
the join order for X1 matches--but that's wrong because it's really X2 
that we wanted to check.

If instead of ["T1", "T2"] you pass in ["T3", "T2", "T1", "T2", 
"T1"]--i.e. include *ALL* tables in the query, even the ones that aren't 
necessarily targeted--then I think you'd get the desired behavior. The 
downside to this is that the test will fail if a join order about which 
we "don't care" changes (ex. the join order for X1 in this case).  But 
that's how things work today with the canon-based test, as well, so even 
if it's not ideal, at least it wouldn't really be any worse...

To get the ideal behavior (where the test fails if and only if the 
"targeted" subquery's join order is not what is expected) with the 
proposed orderedSearchStrings() approach, one would have to ensure that 
the table names used in the targeted subquery do not appear anywhere 
else in the query.  My guess is that you would have to rewrite a good 
number of tests to guarantee that, which would probably be non-trivial.

So it seems like the easiest approach would be to follow Kathey's 
suggestion, but make sure that all tests which use the new method pass 
in a full list of all base table names in the query (not just a targeted 
subset).

Army


Mime
View raw message