On Sep = 19, 2005, at 10:26 AM, Suavi Ali Demir wrote:

Actually, it sounds like the problem of finding top = 1000 rows out of=A0 166333 rows is different than sorting 166333 rows = and maybe it could be optimized.

Indeed.=A0

There is no need to sort all 166333 but the = information that we are only looking 1000 rows would have to be passed = all the way down to the point where Derby decides to sort. I have not = thought through the details of an algorithm=A0but when nRows we want is = substantially smaller than TotalRows then i=A0just feel=A0there should = be a better way to pick those nRows. For example, if nRows were 1, then = all we had to do would be 1 single pass on 166333 rows to find the max. = That is quite different than sorting all and this idea should be = possible to generalize on = 1<=3DnRows<TotalRows.

I agree that this would be a = useful improvement. Now, how do we tell the back end that we want only = the first=A0 1000 rows?

Or more generally (as found = in competitive products):

How do we tell the back end = that we want to skip the first N and return the next M = rows?

Craig

Ali

Craig Russell <Craig.Russell@Sun.COM> wrote:
Hi Scott,

=46rom the query plan it = appears that your filter selects 166,333 rows, of which you want the = first 1000 according to the ordering of the order_id column. You can see = that this is an effective strategy because=A0=A0 =A0=A0 =A0=A0 =A0=A0 =A0Number = of rows qualified=3D166333=A0Number of rows visited=3D166333. = There's no time lost visiting rows that don't qualify.

The database has to sort = the 166,333 rows because the results are ordered according to the index = scan column "time" not according to the order_id column. All of the rows = need to be sorted even though you only want the first 1000 rows. I'd = guess that the sorting of the 166,333 rows is what accounts for the 15 = second delay you are experiencing.

The index on order_id = doesn't do you any good because you have a result that isn't indexed on = order_id. If this isn't obvious, try to think of an algorithm that would = use the order_id index on the result set.

Craig

=
On Sep 19, 2005, at 9:29 AM, scotto wrote:

So for the second query:=A0

select * from = orders
where time > '10/01/2002' and = time < '11/30/2002'
order by = order_id;

=
the query plan shows that the index = IX_ORDERS_TIME is used to filter the
result set by time.=A0 = The order by step does not use the primary key index to
=
sort the results after the filter step.=A0 My questions:=A0

--Is it = correct that the sort step not use the primary key index in this
=
case? =A0

--Why is it = not possible to use the index on order_id to sort after the
filter has happened?=A0

Here is the = query plan:=A0

Statement Name:=A0
=
=A0 =A0 null
Statement Text:=A0
=A0 =A0 select * from = orders=A0
where time > '10/01/2002' and time < = '11/30/2002'
order by order_id
=
Parse Time: 0
Bind Time: 0
Optimize Time: = 0
Generate Time: 0
Compile Time: 0
Execute Time: 14329
Begin = Compilation Timestamp : null
End = Compilation Timestamp : null
Begin = Execution Timestamp : 2005-09-19 09:20:06.171
End Execution Timestamp : 2005-09-19 09:20:20.5
Statement Execution Plan Text:=A0
Sort ResultSet:
Number of opens =3D = 1
Rows input =3D 166333
Rows returned =3D 1000
Eliminate duplicates =3D false
In = sorted order =3D false
Sort = information:=A0
=A0 =A0 Number of merge = runs=3D1
=A0 = =A0 Number of rows input=3D166333
=A0 =A0 Number of rows = output=3D166333
=A0 = =A0 Size of merge runs=3D[93695]
=A0 =A0 Sort = type=3Dexternal
=A0 = =A0 constructor time (milliseconds) =3D 0
=A0 =A0 open time = (milliseconds) =3D 14297
=A0 = =A0 next time (milliseconds) =3D 32
=A0 =A0 close time = (milliseconds) =3D 0
=A0 = =A0 optimizer estimated row count:=A0 =A0 =A0 =A0 78377.51
=
=A0 =A0 optimizer estimated = cost: =A0 =A0 =A0 = 166745.12

Source result set:
=A0 =A0 Index Row to Base = Row ResultSet for ORDERS:
=A0 = =A0 Number of opens =3D 1
=A0 =A0 Rows seen =3D = 166333
=A0 = =A0 Columns accessed from heap =3D {0, 1, 2, 3, 4, 5, = 6}
=A0 = =A0 =A0 =A0 constructor time (milliseconds) =3D 0
=
=A0 =A0 =A0 =A0 open time = (milliseconds) =3D 0
=A0 = =A0 =A0 =A0 next time (milliseconds) =3D 10488
=A0 =A0 =A0 =A0 close time = (milliseconds) =3D 0
=A0 = =A0 =A0 =A0 optimizer estimated row count:=A0 =A0 =A0 =A0 78377.51
=
=A0 =A0 =A0 =A0 optimizer = estimated cost: =A0 =A0 =A0 = 166745.12

=A0 = =A0 =A0 =A0 Index Scan ResultSet for ORDERS using index = IX_ORDERS_TIME
at read committed = isolation level using instantaneous share row locking
chosen by the optimizer
=A0 =A0 =A0 =A0 Number of = opens =3D 1
=A0 = =A0 =A0 =A0 Rows seen =3D 166333
=A0 =A0 =A0 =A0 Rows = filtered =3D 0
=A0 = =A0 =A0 =A0 Fetch Size =3D 16
=A0 =A0 =A0 =A0 =A0 =A0 = constructor time (milliseconds) =3D 0
=A0 =A0 =A0 =A0 =A0 =A0 open = time (milliseconds) =3D 0
=A0 = =A0 =A0 =A0 =A0 =A0 next time (milliseconds) =3D = 3438
=A0 = =A0 =A0 =A0 =A0 =A0 close time (milliseconds) =3D 0
=
=A0 =A0 =A0 =A0 =A0 =A0 next = time in milliseconds/row =3D 0

=A0 = =A0 =A0 =A0 scan information:=A0
=A0 =A0 =A0 =A0 =A0 =A0 Bit = set of columns fetched=3DAll
=A0 = =A0 =A0 =A0 =A0 =A0 Number of columns fetched=3D2
=
=A0 =A0 =A0 =A0 =A0 =A0 = Number of deleted rows visited=3D0
=A0 =A0 =A0 =A0 =A0 =A0 = Number of pages visited=3D887
=A0 =A0 =A0 =A0 =A0 =A0 = Number of rows qualified=3D166333
=A0 =A0 =A0 =A0 =A0 =A0 = Number of rows visited=3D166333
=A0 =A0 =A0 =A0 =A0 =A0 Scan = type=3Dbtree
=A0 = =A0 =A0 =A0 =A0 =A0 Tree height=3D3
=A0 =A0 =A0 =A0 =A0 =A0 = start position:=A0
=A0 =A0 > on first 1 = column(s).
=A0 = =A0 Ordered null semantics on the following columns:=A0

=A0 = =A0 =A0 =A0 =A0 =A0 stop position:=A0
=A0 =A0 >=3D on first 1 = column(s).
=A0 = =A0 Ordered null semantics on the following columns:=A0

=A0 = =A0 =A0 =A0 =A0 =A0 qualifiers:
None
=A0 = =A0 =A0 =A0 =A0 =A0 optimizer estimated row count:=A0 =A0 =A0 =A0 78377.51
=
=A0 =A0 =A0 =A0 =A0 =A0 = optimizer estimated cost: =A0 =A0 =A0 166745.12=A0

=

--scott

=
-----Original Message-----
From: Sunitha Kambhampati [mailto:ksunithaghm@gmail.com]=A0
Sent: Friday, September 16, 2005 5:55 PM
To: Derby Discussion
Subject: Re: = derby performance and 'order by'

Scott Ogden = wrote:

=
I have observed = some interesting query performance behavior and am=A0
hoping someone here can explain.

In my scenario, = it appears that an existing index is not being used=A0
for the 'order by' part of the operation and as a result the=A0
performance of certain queries is suffering.

Can someone explain if this is supposed to be what is happening = and=A0
why? Please see below for the specific queries and = their performance=A0
=
characteristics.

Here are the particulars:

---------------------------------

create table = orders(

=
order_id varchar(50) NOT NULL

CONSTRAINT ORDERS_PK PRIMARY KEY,

amount = numeric(31,2),

time date,

inv_num varchar(50),

line_num varchar(50),
=

phone varchar(50),

prod_num = varchar(50));

--Load a large amount of data = (720,000 records) into the 'orders' table

--Create an = index on the time column as that will be used in the=A0
'where' clause.

create index IX_ORDERS_TIME = on orders(time);

--When I run a query against = this table returning top 1,000 records,=A0
this query returns very quickly, consistently less than .010 = seconds.

=
select * from orders

where time > '10/01/2002' and time < '11/30/2002'

order by time;

--Now run a similarly query = against same table, returning the top=A0
1,000 records.

--The difference is that the = results are now sorted by the primary key=A0
('order_id') rather than 'time'.

--This query = returns slowly, approximately 15 seconds. Why??

select * from orders

where time > '10/01/2002' = and time < '11/30/2002'

order by order_id;

--Now run a third query against the same 'orders' table, removing = the=A0
where clause

--This query returns = quickly, around .010 seconds.

select * from = orders

=
order by order_id;

---------------------------------------------

If you run with derby.language.logQueryPlan=3Dtrue, = the actual query plans=A0
used for the following queries will be = written to derby.log. This will=A0
show what indexes was used by the optimizer. Also see=A0

Query with 'order by' will require sorting. = Usually, sorting requires an=A0
extra step to put the data into the right order. This extra step = can be=A0
avoided for data that are already in the right = order. For example, if a=A0
single-table query has an ORDER BY on a single column, and there is = an=A0
index on that column, sorting can be avoided if = Derby uses the index as=A0
the access path.

I think in case of your first = and third query the optimizer will pick=A0
the available index thus probably avoiding requiring the sort = step.

Your second query involves more work than the = first query, since it has=A0
a search condition on time, and an order by order_id. Thus if = the=A0
optimizer picks the index on time, that will = involve a sort step on=A0
order_id.
____________

Thanks,
Sunitha.

Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo=
P.S. A good JDO? O, = Gasp!

Craig Russell

Architect, Sun Java = Enterprise System http://java.sun.com/products/jdo=

408 = 276-5638 mailto:Craig.Russell@sun.com

P.S. A = good JDO? O, Gasp!

= --Apple-Mail-64-668931342-- --Apple-Mail-65-668932093 Content-Transfer-Encoding: base64 Content-Type: application/pkcs7-signature; name=smime.p7s Content-Disposition: attachment; filename=smime.p7s MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGHjCCAtcw ggJAoAMCAQICAw3FWTANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhh d3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVt YWlsIElzc3VpbmcgQ0EwHhcNMDUwMTEwMDA0MTA5WhcNMDYwMTEwMDA0MTA5WjBHMR8wHQYDVQQD ExZUaGF3dGUgRnJlZW1haWwgTWVtYmVyMSQwIgYJKoZIhvcNAQkBFhVDcmFpZy5SdXNzZWxsQFN1 bi5DT00wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDti7ZE4rO6oXKbLM02AG9WY55t udmBVL53fb3V3X5S1kvcJOk1NEMIYT/T7Ww+/hE955zvHT29+mIoNe8AW/yj1WUH8uGG2HxhwCHI UQTHmN/ioVJgjwUaYbtNMKbL/NRpnL0QWewdMJS+6IFzFyX7ADFW5cJ+UWNLvNeWAQtN0mtLildn vdOgh50i8YPvACNkCHoomGjXx0azcXbe1X3c5AgRI6e2CZe5k2lRFQFUMqkjdoMtQPoNqJ1BxH9l i4cnabl8mcTwHHl44hrvb8ThqwRf2pfJh2vVuwmgK6z4IWjOk4RQM+0ODsRdq67mBdimJYmPMK1p RMBHzrUsfKxNAgMBAAGjMjAwMCAGA1UdEQQZMBeBFUNyYWlnLlJ1c3NlbGxAU3VuLkNPTTAMBgNV HRMBAf8EAjAAMA0GCSqGSIb3DQEBBAUAA4GBAIj86LzxCHedStDLMEeqHLy+UFG7zIRHfChSIV42 +MvXicydXEBh8v0Ry1V2d/lY4jS78G5yW5R9fKt1U5nlRBCOVzdhomvSolnNRIT71wPVVDrAIVlA YpXKxSmVBq7+4hV+3ZLHDeq3qZnNmiJR0sTEUD16xZX1RJs9dRYPCHoRMIIDPzCCAqigAwIBAgIB DTANBgkqhkiG9w0BAQUFADCB0TELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTES MBAGA1UEBxMJQ2FwZSBUb3duMRowGAYDVQQKExFUaGF3dGUgQ29uc3VsdGluZzEoMCYGA1UECxMf Q2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZpc2lvbjEkMCIGA1UEAxMbVGhhd3RlIFBlcnNvbmFs IEZyZWVtYWlsIENBMSswKQYJKoZIhvcNAQkBFhxwZXJzb25hbC1mcmVlbWFpbEB0aGF3dGUuY29t MB4XDTAzMDcxNzAwMDAwMFoXDTEzMDcxNjIzNTk1OVowYjELMAkGA1UEBhMCWkExJTAjBgNVBAoT HFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBG cmVlbWFpbCBJc3N1aW5nIENBMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEpjxVc1X7TrnK mVoeaMB1BHCd3+n/ox7svc31W/Iadr1/DDph8r9RzgHU5VAKMNcCY1osiRVwjt3J8CuFWqo/cVbL rzwLB+fxH5E2JCoTzyvV84J3PQO+K/67GD4Hv0CAAmTXp6a7n2XRxSpUhQ9IBH+nttE8YQRAHmQZ cmC3+wIDAQABo4GUMIGRMBIGA1UdEwEB/wQIMAYBAf8CAQAwQwYDVR0fBDwwOjA4oDagNIYyaHR0 cDovL2NybC50aGF3dGUuY29tL1RoYXd0ZVBlcnNvbmFsRnJlZW1haWxDQS5jcmwwCwYDVR0PBAQD AgEGMCkGA1UdEQQiMCCkHjAcMRowGAYDVQQDExFQcml2YXRlTGFiZWwyLTEzODANBgkqhkiG9w0B AQUFAAOBgQBIjNFQg+oLLswNo2asZw9/r6y+whehQ5aUnX9MIbj4Nh+qLZ82L8D0HFAgk3A8/a3h YWLD2ToZfoSxmRsAxRoLgnSeJVCUYsfbJ3FXJY3dqZw5jowgT2Vfldr394fWxghOrvbqNOUQGls1 TXfjViF4gtwhGTXeJLHTHUb/XV9lTzGCAucwggLjAgEBMGkwYjELMAkGA1UEBhMCWkExJTAjBgNV BAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25h bCBGcmVlbWFpbCBJc3N1aW5nIENBAgMNxVkwCQYFKw4DAhoFAKCCAVMwGAYJKoZIhvcNAQkDMQsG CSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMDUwOTE5MTc1NTAzWjAjBgkqhkiG9w0BCQQxFgQU /LbbNGV71ENrCQH2GsWX/55CqGYweAYJKwYBBAGCNxAEMWswaTBiMQswCQYDVQQGEwJaQTElMCMG A1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNv bmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAw3FWTB6BgsqhkiG9w0BCRACCzFroGkwYjELMAkGA1UE BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1Ro YXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAgMNxVkwDQYJKoZIhvcNAQEBBQAEggEA 5X5Z3u3hztlGIs2NP2+rHVJu5V2gB2jz912eMK8yuw6ICnlbaqzMAuyXQMz3inRnpCksv4s3weys IFK2Jm9DVidldns8whin+KjtPRoGFv7qOnT/jdHCfKdlMPXUQeEhcwmz1lBzKtcDLdnErteWk15F KOpLmHh+EATRiOw/PuC71jJACAruMrhsypOO8IK7M6v8BnX0EvkLJWax9XUpT6qSMIQYC+NyeMeZ uGzJ3rbLhivE62X3gYoXyv0/sexk+irqA3HHe7wpfMBDGF25QZbqQFV2V3x3YAbPx+G81jVKU/Z3 I8vG45yXKn28mdIStJY2vcoRSPfnFh/FLjBffQAAAAAAAA== --Apple-Mail-65-668932093--