db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-47) Some possible improvements to IN optimization
Date Tue, 13 Mar 2007 22:11:09 GMT

     [ https://issues.apache.org/jira/browse/DERBY-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Matrigali updated DERBY-47:
--------------------------------


I took a look at the diffs in readlocks and I believe all are "correct" with respect to your
changes.  It looks to me like there is an existing bug not affected by your code, see case
6 below.  The readlocks test runs the same set of tests through multiple 
setups varying isolation level, rows size and unique vs. non-unique index.  In all cases the
diffs you are seeing are due to 
a test of locks gotten from walking a cursor down the result set from: select a from a where
a = 5 or a = 7' .  As described it is expected that you IN logic change should apply to this
query when there is an index on a.  I have described what I think is happening in the diffs
below, diff line numbers come from applying your master patch to a current codeline and then
running svn diff.  

Diff line notation is from a svn diff after applying the master patch
posted to DERBY-47

1) diff lines:4946,7 +4946,7 @@^M
    o diff ok
    o before change first next would bulk load all rows leaving a "scan lock"
      on the last page (3,1).   Now on first next code does a probe for the
      5 from the (a=5 or a=7), so first lock query shows scan lock on (2,1)
      associated with 5.  There are no real row locks as this is read
      uncommitted test.

2) @@ -8103,6 +8103,7 @@^M
   @@ -8112,6 +8113,7 @@^M
    o diff ok, shows 1 more row lock after each next in an expected OR case
    o before change first next would bulk load all rows and by time lock
      query is executed all locks would be released due to read committed.
      Now because of probing we only get locks as we probe each value and
      test shows lock is held during next call and then released when we
      next to the next value.

3) @@ -8956,6 +8958,7 @@^M
   @@ -8965,6 +8968,7 @@^M
   o diff ok, same reasoning as 2

4) @@ -11255,7 +11259,8 @@^
   @@ -11265,6 +11270,7 @@^M
   o diff ok, same reasoning as 2 - row numbers are different from 2 because
     of different padding in test table.

5) @@ -12101,6 +12107,7 @@^M
   @@ -12110,6 +12117,7 @@^M
   o diff ok, same reasoning as 4

6) @@ -14746,7 +14754,6 @@^M
   o I think there is a bug in existing code, the incremental diff looks ok.
   o there should not be any locks left after the scan_cursor is closed in
     read committed but there at line 14762 of original test.

7) @@ -15752,7 +15759,6 @@^M
   o same as 6

8) @@ -18421,9 +18427,8 @@^M

   o same as 6

9) @@ -19421,7 +19426,6 @@^M
   o same as 6

10) @@ -21779,8 +21783,6 @@^M]
   o diff ok!!
   o this is a good test that shows that the new code only visits the 2 rows
     of the or clause and does not get locks on any other rows under
     serializable with a unique index.
     Old change shows it scaning the range an unnecessarily
     locking an extra row.

11) @@ -21791,7 +21793,6 @@^M
   o diff ok
   o same as 10

12) @@ -21799,7 +21800,6 @@^
   o diff ok
   o same as 10

13) @@ -22639,8 +22639,6 @@^M
   o diff ok
   o not as good a test as 10.  Because of previous key locking and the very
     small data set both before and after we lock the same number of rows.
     Diff does show difference in processing between before and after.  If
     there had been more than one row between 5 and 7 with the non-unique
     index it would have shown less rows locked under new code vs. old code.
     Adding a test for "IN(1, 7)" would show this off.  If you are going to
     add new test I would suggest checking in current set of diffs and then
     adding separate test  as it is easier to identify diffs from
     new tests.

14) @@ -24974,11 +24972,9 @@^M
    o diff ok
    o same as 13

15) @@ -25831,8 +25827,6 @@^M
    o same as 13

> Some possible improvements to IN optimization
> ---------------------------------------------
>
>                 Key: DERBY-47
>                 URL: https://issues.apache.org/jira/browse/DERBY-47
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.0.2.0
>         Environment: all
>            Reporter: Sunitha Kambhampati
>         Assigned To: A B
>         Attachments: d47_engine_doNotCommit_v1.patch, d47_engine_doNotCommit_v1.stat,
d47_mp_addlTestCases.patch, d47_mp_CBO_MoAP_v1.patch, d47_mp_CBO_MoAP_v1.stat, d47_mp_codeGen_v1.patch,
d47_mp_codeGen_v1.stat, d47_mp_exec_v1.patch, d47_mp_exec_v1.stat, d47_mp_masters_v1.patch,
d47_mp_preprocess_v1.patch, d47_mp_preprocess_v1.stat, d47_mp_relOpPredCheck_v1.patch, d47_mp_relOpPredCheck_v1.stat,
derby-47-performance-data.txt, derby-47-performance-data.txt, Derby47PerformanceTest.java,
Derby47PerformanceTest.java, InListOperatorNode.java, QueryPlanUniqueIndexAndWordIndexOneTerm.txt,
QueryPlanUniqueIndexAndWordIndexTwoTerms.txt, QueryPlanUniqueIndexOnlyOneTerm.txt, QueryPlanUniqueIndexOnlyTwoTerms.txt,
readlocks.diff, readlocks_withContext.diff, readlocks_withContext.diff
>
>
> Consider a simple case of  - 
> A table tbl has 10000 rows, there is a primary key index on i1
> and the query in question is 
>  select * from tbl where i1 in (-1,100000)
> derby does a table scan of the entire table even though the "IN" list has only two values
and the comparison is on a field that has an index.
> Briefly looking at the code, it seems like we insert a between and use the IN list to
get the start and stop values for the scan. Thus the range of the values in the "IN" list
here plays an important role. 
> Thus if the query was changed to select * from tbl where i1 in (-1, 1), an index scan
would be chosen.
> It would be nice if we could do something clever in this case where there is clearly
an index on the field and the number of values in the IN list is known. Maybe use the rowcount
estimate and the IN list size to do some optimizations.  
> - consider the length of the "IN" list to do searches on the table.  ie use the IN list
values to do index key searches on the table,
> -or try to convert it to a join. Use the "IN" list values to create a temporary table
and do a join. It is most likely that the optimizer will choose the table with "IN" list here
as the outer table in the join and thus will do key searches on the larger table. 
> -------------------------------------------------------------------
> some query plans that I logged using derby.language.logQueryPlan=true for some similar
queries:
> Table has ascending values from 0 - 9999 for i1. primary key index on i1.
> GMT Thread[UT0,5,main] (XID = 19941), (SESSIONID = 0), select * from scanfixed where
i1 in (-1,9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict ResultSet
(2):
> Number of opens = 1
> Rows seen = 10000
> Rows filtered = 9990
> restriction = true
> projection = false
> 	constructor time (milliseconds) = 0
> 	open time (milliseconds) = 0
> 	next time (milliseconds) = 0
> 	close time (milliseconds) = 0
> 	restriction time (milliseconds) = 0
> 	projection time (milliseconds) = 0
> 	optimizer estimated row count:          750.38
> 	optimizer estimated cost:         8579.46
> Source result set:
> 	Table Scan ResultSet for SCANFIXED at read committed isolation level using instantaneous
share row locking chosen by the optimizer
> 	Number of opens = 1
> 	Rows seen = 10000
> 	Rows filtered = 0
> 	Fetch Size = 16
> 		constructor time (milliseconds) = 0
> 		open time (milliseconds) = 0
> 		next time (milliseconds) = 0
> 		close time (milliseconds) = 0
> 		next time in milliseconds/row = 0
> 	scan information: 
> 		Bit set of columns fetched=All
> 		Number of columns fetched=9
> 		Number of pages visited=417
> 		Number of rows qualified=10000
> 		Number of rows visited=10000
> 		Scan type=heap
> 		start position: 
> null		stop position: 
> null		qualifiers:
> Column[0][0] Id: 0
> Operator: <=
> Ordered nulls: false
> Unknown return value: false
> Negate comparison result: false
> Column[0][1] Id: 0
> Operator: <
> Ordered nulls: false
> Unknown return value: true
> Negate comparison result: true
> 		optimizer estimated row count:          750.38
> 		optimizer estimated cost:         8579.46
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l
> 2004-10-14 18:59:47.577 GMT Thread[UT0,5,main] (XID = 19216), (SESSIONID = 0), select
* from scanfixed where i1 in (9999,9998,9997,9996,9995,9994,9993,9992,9991,9990) ******* Project-Restrict
ResultSet (3):
> Number of opens = 1
> Rows seen = 10
> Rows filtered = 0
> restriction = true
> projection = true
> 	constructor time (milliseconds) = 0
> 	open time (milliseconds) = 0
> 	next time (milliseconds) = 0
> 	close time (milliseconds) = 0
> 	restriction time (milliseconds) = 0
> 	projection time (milliseconds) = 0
> 	optimizer estimated row count:            4.80
> 	optimizer estimated cost:           39.53
> Source result set:
> 	Index Row to Base Row ResultSet for SCANFIXED:
> 	Number of opens = 1
> 	Rows seen = 10
> 	Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6, 7, 8}
> 		constructor time (milliseconds) = 0
> 		open time (milliseconds) = 0
> 		next time (milliseconds) = 0
> 		close time (milliseconds) = 0
> 		optimizer estimated row count:            4.80
> 		optimizer estimated cost:           39.53
> 		Index Scan ResultSet for SCANFIXED using index SCANFIXEDX at read committed isolation
level using instantaneous share row locking chosen by the optimizer
> 		Number of opens = 1
> 		Rows seen = 10
> 		Rows filtered = 0
> 		Fetch Size = 16
> 			constructor time (milliseconds) = 0
> 			open time (milliseconds) = 0
> 			next time (milliseconds) = 0
> 			close time (milliseconds) = 0
> 			next time in milliseconds/row = 0
> 		scan information: 
> 			Bit set of columns fetched=All
> 			Number of columns fetched=2
> 			Number of deleted rows visited=0
> 			Number of pages visited=2
> 			Number of rows qualified=10
> 			Number of rows visited=10
> 			Scan type=btree
> 			Tree height=2
> 			start position: 
> 	>= on first 1 column(s).
> 	Ordered null semantics on the following columns: 
> 			stop position: 
> 	> on first 1 column(s).
> 	Ordered null semantics on the following columns: 
> 			qualifiers:
> None
> 			optimizer estimated row count:            4.80
> 			optimizer estimated cost:           39.53

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message