db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-6317) Optmizer can choose the wrong path when BTreeCostController.java returns an estimate cost and row count of 0.0
Date Sat, 31 Aug 2013 01:57:51 GMT

     [ https://issues.apache.org/jira/browse/DERBY-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mamta A. Satoor updated DERBY-6317:
-----------------------------------

    Attachment: testRepro_v1.txt
                DERBY_6317_temp_changes_for_debugging.txt

I had to make a change to the earlier test program since that program was inserting duplicate
values in non-unique foreign key on Table2. It now inserts all unique values in foreign key
corresponding to Table3 and I have been able to reproduce the problem. I have attached the
repro test case as testRepro_v1.txt. The program creates threee tables as shown below
        CREATE TABLE Table1 (ID int PRIMARY KEY NOT NULL);
        CREATE TABLE Table2 (Table1_ID int NOT NULL, Table3_ID int NOT NULL,
		CONSTRAINT TABLE2_PK PRIMARY KEY (Table1_ID,Table3_ID));
        CREATE TABLE Table3 (ID int PRIMARY KEY NOT NULL); 

        ALTER TABLE table2 ADD CONSTRAINT TABLE2_FK_1 FOREIGN KEY (Table1_ID) REFERENCES TABLE1(ID);
        ALTER TABLE table2 ADD CONSTRAINT TABLE2_FK_2 FOREIGN KEY (Table3_ID) REFERENCES TABLE3(ID);

The program inserts 1million rows in Table1 and 8 million rows in both Table2 and Table3.
After the data is loaded, it selects each of the 8 million rows from Table2 individually with
a query like following(the constant assigned to
t0.Table3_ID iterates from 0 to 8million-1).
SELECT count(*) FROM 
	Table1 T1,
	Table2 t0 
	WHERE t1.ID = t0.Table1_ID and 
		t0.Table3_ID = 0;

Additionally, I have two printlns in BTreeCostController.getScanCost towards the bottom of
the method. I have attached svn diff for changes in BTreeCostController.getScanCost as DERBY_6317_temp_changes_for_debugging.txt.
The method will also have the changes suggested by Mike in his second patch but those changes
are commented out.
		.....
		//Added this println and if statement with println. The rest of the code is unchanged
        	System.out.println("In engine estimated row count is " + estimated_row_count);
        	if (estimated_row_count < 0.5)
	            	System.out.println("We are going to round estimated row count to 0");
            // RESOLVE - should we make sure this number is > 0?
            cost_result.setEstimatedRowCount(Math.round(estimated_row_count));
        }
        finally

Run the test program after making the changes in BTreeCostController.getScanCost as follows.
java org.apache.derbyTesting.functionTests.tests.lang.MamtaJDBC > dellater.txt
It will produce a really large dellater.txt. notepad couldn't open the file because of it's
size. I did grep on the file to look for string "We are going to round estimated row count
to 0" and found number of occurences of it. An example of the string were as follows
./dellater2.txt:51892862:We are going to round estimated row count to 0
./dellater2.txt:51892866:We are going to round estimated row count to 0
./dellater2.txt:51892904:We are going to round estimated row count to 0
./dellater2.txt:51892908:We are going to round estimated row count to 0
./dellater2.txt:51892946:We are going to round estimated row count to 0
./dellater2.txt:51892950:We are going to round estimated row count to 0
./dellater2.txt:51892988:We are going to round estimated row count to 0
./dellater2.txt:51892992:We are going to round estimated row count to 0
./dellater2.txt:51893030:We are going to round estimated row count to 0
./dellater2.txt:51893034:We are going to round estimated row count to 0
./dellater2.txt:51893072:We are going to round estimated row count to 0
.....

I picked the first line from grep output above and used that line number to find the value
of the constant in the SELECT query that resulted in estimated row count to be rounded to
0 and found following
$ head -n 51892862 dellater2.txt | tail -n 10
In engine estimated row count is 8000005.0
In engine estimated row count is 0.9536743
In engine estimated row count is 1000005.0
t0.Table3_ID =5189284
In engine estimated row count is 1000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.47683716
We are going to round estimated row count to 0

Next, I tried that t0.Table3_ID =5189284 value directly in ij with log query plan on and could
see in derby.log that we are doing table scan for the inner(right) resultset. I tried value
one higher than the problem ID value and saw that we use index scan for that SELECT query.
$ java -Dderby.language.logQueryPlan=true -Dij.exceptionTrace=true org.apache.derby.tools.ij
ij version 10.11
ij> connect 'jdbc:derby:clobtest3';
ij> SELECT count(*) FROM
        Table1 T1,
        Table2 t0
        WHERE t1.ID = t0.Table> 1_ID and
                t0.Tabl> e3_ID = 5189284 ;
In engine estimated row count is 1000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.47683716
We are going to round estimated row count to 0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.47683716
We are going to round estimated row count to 0
In engine estimated row count is 1000005.0
1
-----------
1

1 row selected
ij> SELECT count(*) FROM
        Table1 T1,
        Table2 t0
        WHERE t1.ID = t0.Table1_ID and
              t0.Table3_ID = 5189285;
In engine estimated row count is 1000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.9536743
In engine estimated row count is 8000005.0
In engine estimated row count is 8000005.0
In engine estimated row count is 0.9536743
In engine estimated row count is 1000005.0
1
-----------
1

1 row selected


I applied Mike's 2nd patch which check if estimated row count is less than <.5 and if so,
it assigns value 1 to estimated row count. With that patch, I ran the 2 queries again in ij
against the database and saw that we are indeed using index scan for both the queries. So,
the patch definitely fixed the problem with the repro db I have created.

The test program takes over 4-5hours to load the data and run 8millions selects.

                
> Optmizer can choose the wrong path when BTreeCostController.java returns an estimate
cost and row count of 0.0
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-6317
>                 URL: https://issues.apache.org/jira/browse/DERBY-6317
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.8.2.2
>         Environment: Derby 10.8.2.2 on Oracle Solaris 10 
>            Reporter: Brett Bergquist
>            Assignee: Mike Matrigali
>         Attachments: derby6317_2.diff, derby6317.diff, DERBY_6317_temp_changes_for_debugging.txt,
testRepro_v1.txt
>
>
> The optimizer can chose the wrong path when BTreeCostController.java returns an estimate
cost and row count of 0.0.  
> Assume that you have two tables that are being joined like:
> SELECT * FROM T1, T0
> WHERE T1.ID = T0.F_ID and
> T0.ID = 3;
> Also assume that T0 has two columns, ID and F_ID and F_ID is a foreign key on T1.ID.
  Assume that T1.ID is the primary key of T1 and (T0.F_ID, T0.ID) is the primary key on T0.
 Assume that there is a non-unique index on T0.ID.
> The correct query plan for this should be to query T0 using the non-unique index on T0.ID
and then use the foreign key value in those rows to do query T1 using the primary key on T1.
> With some values of T0.ID in the above query this query plan is chosen and works.  With
other values of T0.ID , the query plan does an query on T0 using the non-unique index on T0.ID
and then does a table scan on T1.
> For example, in my case the query:
> SELECT * FROM T1, T0
> WHERE T1.ID = T0.F_ID and
> T0.ID = 22112129;
> has this query plan.   
> The problem appears to be in BTreeCostController.java.  When this returns the same value
for the "left_of_start" and the "left_of_stop" (which is being used to estimate the number
of rows and cost), then the estimate cost and row count becomes 0.0.   When this is used in
the join order of T0, T1, then the cost of the table scan for T1 becomes 0.0 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message