db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-5066) full table scan when index is used, taking extremely long time in JDBC
Date Thu, 24 Feb 2011 22:18:38 GMT

     [ https://issues.apache.org/jira/browse/DERBY-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Matrigali updated DERBY-5066:
----------------------------------


It is great you posted the 2 query plans, that is a good first step to getting help.  If possible
you should post db and actual test program.  If the
data is not appropriate to post it is usually not to hard to come up with dummy ddl and dummy
data to load into the db so that everyone can
see exactly what is going on.

I think there is something wierd going on here as I am not sure why one plan is being picked
over the other.  In the 2 plans that
you posted were there 500k rows?  It looks like the optimizer is estimating around 69k total
rows and the difference in the 2 plans
is that it thinks the >0 is going to return only around 22k rows, but looks like it return
all or almost all the rows.  This could be an issue with
statistics or index maintenance.  Do you see any difference if you run SYSCS_UTIL.SYSCS_COMPRESS_TABLE()
and rerun the queries.  This
will get rid of any old cruft in the index which helps with histogram estimates for queries,
and also will generate new row count estimates and
cardinality statistics if appropriate.
  
But if the code you posted is really what is being used your
test is flawed as it never looks at the result which is why I think you are seeing orders
of magnitude differences.   Sometimes
Derby will begin returning rows before the query has finished, I think this happens in all
cases except when a sort is required.

So your test is sort of measuring time to first row, rather than time to process the whole
query.  To be a true measure of time to first row
it should include a fetch of the first row and then end the measurement.   When the query
includes a sort this will be the
time to process all the rows, while with no sort this will return the first row very quickly.
 I know the optimizer does not take
time to first row into account for optimization.

Looking at the query plans you posted at least from the server side the difference is total
time from begin compilation to end execution is much closer,
with one being around 3 seconds and one being 8 seconds.

Also comparing these 2 queries can easily feed on each other.  If run one after another then
derby cache will help the second.  Even if the server
is cycled the OS cache is very likely to help the second query.  



> full table scan when index is used, taking extremely long time in JDBC
> ----------------------------------------------------------------------
>
>                 Key: DERBY-5066
>                 URL: https://issues.apache.org/jira/browse/DERBY-5066
>             Project: Derby
>          Issue Type: Bug
>          Components: Eclipse Plug-in, JDBC
>    Affects Versions: 10.7.1.1
>         Environment: Windows XP environment
>            Reporter: George Xu
>
> When a very large table (500k rows) is used with a column is indexed.   select * from
tab where pid > 0 order by pid takes extremely longer time than select * from tab order
by pid.  Actually, it is 100 times slower.  However, in IJ, ther performance seems to be similar.
 PID column is indexed.
> Here is the code snipplet
> import java.sql.Connection;
> import java.sql.DriverManager;
> import java.sql.ResultSet;
> import java.sql.SQLException;
> import java.sql.Statement;
> public class TestPerformance {
> 	//20343 mill-sec
> 	static String s1 = "SELECT TIMESTAMP, HOSTPORT AS \"HOST ID\", PID, SESSIONID, REQUESTID,
" +
> 	"SUBREQUESTID, STEPID, TID, COMPONENT, BUILDNUM, " +
> 	"LOGLEVELORIG AS \"LEVEL\", LOGGER, OPERATION, OBJECTTYPE, OBJECTPATH, " +
> 	"STATUS, MESSAGE, DATA, NDX FROM LOGDATA871218 where PID > 0 ORDER BY PID";	
> 	//297 million sec.
> 	static String s2 = "SELECT TIMESTAMP, HOSTPORT AS \"HOST ID\", PID, SESSIONID, REQUESTID,
" +
> 	"SUBREQUESTID, STEPID, TID, COMPONENT, BUILDNUM, " +
> 	"LOGLEVELORIG AS \"LEVEL\", LOGGER, OPERATION, OBJECTTYPE, OBJECTPATH, " +
> 	"STATUS, MESSAGE, DATA, NDX FROM LOGDATA871218 ORDER BY PID";	
> 	public static void main(String[] args) throws InstantiationException, IllegalAccessException,
ClassNotFoundException {
> 		Statement stmt3;
> 		try {
> 			//connect'jdbc:derby:C:/devroot/runtime-New_configuration/LogXData';
> 			String db = "C:/devroot/runtime-New_configuration/LogXData";
> 			String driver = "org.apache.derby.jdbc.EmbeddedDriver";
> 			Class.forName(driver).newInstance();
> 			Connection con = DriverManager.getConnection("jdbc:derby:"+db);			
> 			stmt3 = con.createStatement(ResultSet.FETCH_FORWARD, 
> 			        ResultSet.TYPE_FORWARD_ONLY);
> 			long startTime3 = System.currentTimeMillis();
> 			ResultSet rs3 = stmt3.executeQuery(s1); 
> 			long elapsed3 = System.currentTimeMillis() - startTime3;
> 			System.out.println("Statment.executeQuery Dup: " + elapsed3); 
> 		} catch (SQLException e) {
> 			// TODO Auto-generated catch block
> 			e.printStackTrace();
> 		}			
> 	}
> }

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message