db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suavi Ali Demir <dem...@yahoo.com>
Subject Re: derby performance and 'order by'
Date Mon, 19 Sep 2005 17:26:19 GMT
Actually, it sounds like the problem of finding top 1000 rows out of  166333 rows is different
than sorting 166333 rows and maybe it could be optimized. There is no need to sort all 166333
but the information that we are only looking 1000 rows would have to be passed all the way
down to the point where Derby decides to sort. I have not thought through the details of an
algorithm but when nRows we want is substantially smaller than TotalRows then i just feel
there should be a better way to pick those nRows. For example, if nRows were 1, then all we
had to do would be 1 single pass on 166333 rows to find the max. That is quite different than
sorting all and this idea should be possible to generalize on 1<=nRows<TotalRows.
Ali


Craig Russell <Craig.Russell@Sun.COM> wrote:
Hi Scott,

>From the query plan it appears that your filter selects 166,333 rows, of which you want
the first 1000 according to the ordering of the order_id column. You can see that this is
an effective strategy because             Number of rows qualified=166333 Number of rows visited=166333.
There's no time lost visiting rows that don't qualify.


The database has to sort the 166,333 rows because the results are ordered according to the
index scan column "time" not according to the order_id column. All of the rows need to be
sorted even though you only want the first 1000 rows. I'd guess that the sorting of the 166,333
rows is what accounts for the 15 second delay you are experiencing.


The index on order_id doesn't do you any good because you have a result that isn't indexed
on order_id. If this isn't obvious, try to think of an algorithm that would use the order_id
index on the result set.


Craig

On Sep 19, 2005, at 9:29 AM, scotto wrote:

So for the second query: 


select * from orders
where time > '10/01/2002' and time < '11/30/2002'
order by order_id;


the query plan shows that the index IX_ORDERS_TIME is used to filter the
result set by time.  The order by step does not use the primary key index to
sort the results after the filter step.  My questions: 


--Is it correct that the sort step not use the primary key index in this
case?  


--Why is it not possible to use the index on order_id to sort after the
filter has happened? 


Here is the query plan: 


Statement Name: 
    null
Statement Text: 
    select * from orders 
where time > '10/01/2002' and time < '11/30/2002'
order by order_id
Parse Time: 0
Bind Time: 0
Optimize Time: 0
Generate Time: 0
Compile Time: 0
Execute Time: 14329
Begin Compilation Timestamp : null
End Compilation Timestamp : null
Begin Execution Timestamp : 2005-09-19 09:20:06.171
End Execution Timestamp : 2005-09-19 09:20:20.5
Statement Execution Plan Text: 
Sort ResultSet:
Number of opens = 1
Rows input = 166333
Rows returned = 1000
Eliminate duplicates = false
In sorted order = false
Sort information: 
    Number of merge runs=1
    Number of rows input=166333
    Number of rows output=166333
    Size of merge runs=[93695]
    Sort type=external
    constructor time (milliseconds) = 0
    open time (milliseconds) = 14297
    next time (milliseconds) = 32
    close time (milliseconds) = 0
    optimizer estimated row count:        78377.51
    optimizer estimated cost:       166745.12


Source result set:
    Index Row to Base Row ResultSet for ORDERS:
    Number of opens = 1
    Rows seen = 166333
    Columns accessed from heap = {0, 1, 2, 3, 4, 5, 6}
        constructor time (milliseconds) = 0
        open time (milliseconds) = 0
        next time (milliseconds) = 10488
        close time (milliseconds) = 0
        optimizer estimated row count:        78377.51
        optimizer estimated cost:       166745.12


        Index Scan ResultSet for ORDERS using index IX_ORDERS_TIME
at read committed isolation level using instantaneous share row locking
chosen by the optimizer
        Number of opens = 1
        Rows seen = 166333
        Rows filtered = 0
        Fetch Size = 16
            constructor time (milliseconds) = 0
            open time (milliseconds) = 0
            next time (milliseconds) = 3438
            close time (milliseconds) = 0
            next time in milliseconds/row = 0


        scan information: 
            Bit set of columns fetched=All
            Number of columns fetched=2
            Number of deleted rows visited=0
            Number of pages visited=887
            Number of rows qualified=166333
            Number of rows visited=166333
            Scan type=btree
            Tree height=3
            start position: 
    > on first 1 column(s).
    Ordered null semantics on the following columns: 


            stop position: 
    >= on first 1 column(s).
    Ordered null semantics on the following columns: 


            qualifiers:
None
            optimizer estimated row count:        78377.51
            optimizer estimated cost:       166745.12 








--scott






-----Original Message-----
From: Sunitha Kambhampati [mailto:ksunithaghm@gmail.com] 
Sent: Friday, September 16, 2005 5:55 PM
To: Derby Discussion
Subject: Re: derby performance and 'order by'


Scott Ogden wrote:



I have observed some interesting query performance behavior and am 
hoping someone here can explain.


In my scenario, it appears that an existing index is not being used 
for the 'order by' part of the operation and as a result the 
performance of certain queries is suffering.


Can someone explain if this is supposed to be what is happening and 
why? Please see below for the specific queries and their performance 
characteristics.


Here are the particulars:


---------------------------------


create table orders(


order_id varchar(50) NOT NULL


CONSTRAINT ORDERS_PK PRIMARY KEY,


amount numeric(31,2),


time date,


inv_num varchar(50),


line_num varchar(50),


phone varchar(50),


prod_num varchar(50));


--Load a large amount of data (720,000 records) into the 'orders' table


--Create an index on the time column as that will be used in the 
'where' clause.


create index IX_ORDERS_TIME on orders(time);


--When I run a query against this table returning top 1,000 records, 
this query returns very quickly, consistently less than .010 seconds.


select * from orders


where time > '10/01/2002' and time < '11/30/2002'


order by time;


--Now run a similarly query against same table, returning the top 
1,000 records.


--The difference is that the results are now sorted by the primary key 
('order_id') rather than 'time'.


--This query returns slowly, approximately 15 seconds. Why??


select * from orders


where time > '10/01/2002' and time < '11/30/2002'


order by order_id;


--Now run a third query against the same 'orders' table, removing the 
where clause


--This query returns quickly, around .010 seconds.


select * from orders


order by order_id;


---------------------------------------------



If you run with derby.language.logQueryPlan=true, the actual query plans 
used for the following queries will be written to derby.log. This will 
show what indexes was used by the optimizer. Also see 
http://db.apache.org/derby/docs/10.1/tuning/rtunproper43414.html .


Query with 'order by' will require sorting. Usually, sorting requires an 
extra step to put the data into the right order. This extra step can be 
avoided for data that are already in the right order. For example, if a 
single-table query has an ORDER BY on a single column, and there is an 
index on that column, sorting can be avoided if Derby uses the index as 
the access path.


I think in case of your first and third query the optimizer will pick 
the available index thus probably avoiding requiring the sort step.


Your second query involves more work than the first query, since it has 
a search condition on time, and an order by order_id. Thus if the 
optimizer picks the index on time, that will involve a sort step on 
order_id.
____________


Thanks,
Sunitha.








Craig Russell

Architect, Sun Java Enterprise System http://java.sun.com/products/jdo

408 276-5638 mailto:Craig.Russell@sun.com

P.S. A good JDO? O, Gasp!




Mime
View raw message