Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 53362 invoked from network); 20 Sep 2005 17:20:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 20 Sep 2005 17:20:10 -0000 Received: (qmail 59755 invoked by uid 500); 20 Sep 2005 17:20:08 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 59728 invoked by uid 500); 20 Sep 2005 17:20:08 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 59717 invoked by uid 99); 20 Sep 2005 17:20:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Sep 2005 10:20:08 -0700 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_20_30,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [68.142.200.125] (HELO web30602.mail.mud.yahoo.com) (68.142.200.125) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 20 Sep 2005 10:20:15 -0700 Received: (qmail 85595 invoked by uid 60001); 20 Sep 2005 17:19:49 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=XMT1xSrek5AtjHyoZMe4Y9t9JtgZrP0gOKk5CdoaUroxZsc3iclnFJ/y5yK7Ob6hXrAFw65F5JI+bk+94Lmxp/RcwjERk84MaLM2vFAWuFBISHXWVQ5g8ZbODqo7s5JI8QnvDRoU+W76ngtZlI7tyrvHI3kzkcBBuqDO0Atz3co= ; Message-ID: <20050920171949.85593.qmail@web30602.mail.mud.yahoo.com> Received: from [129.42.184.35] by web30602.mail.mud.yahoo.com via HTTP; Tue, 20 Sep 2005 10:19:49 PDT Date: Tue, 20 Sep 2005 10:19:49 -0700 (PDT) From: Suavi Ali Demir Subject: Re: derby performance and 'order by' To: Derby Discussion In-Reply-To: <4330384C.8000800@sbcglobal.net> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1216346291-1127236789=:81937" Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N --0-1216346291-1127236789=:81937 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account. Regards, Ali Mike Matrigali wrote: As craig points out it is important in performance testing to say exactly what you are measuring. In general Derby will try to stream rows to the user before it has finished looking at all rows. So often looking at the first row will and stopping will mean that many rows have not been processed. BUT when an order by is involved and the query plan either has no appropriate matching index, or decides to use a different index then all the rows are processed, then they are sent to the sorter and finally after all rows are processed they are streamed to the client. So as you have seen reading the first 1000 rows of a much larger data set can happen very quickly. As subsequent mail threads have pointed out, returning the top 1000 sorted rows is an interesting problem which could be costed and executed differently if that information was pushed into the optimizer and the sorter (and medium level projects were done in those areas). scotto wrote: > The test was set up and run using the SQuirreL client, not ij. All 3 of > the queries return the top 1000 rows and the times I reported are to > return these top 1000 rows, not just the first row. > > > > ------------------------------------------------------------------------ > > *From:* Craig Russell [mailto:Craig.Russell@Sun.COM] > *Sent:* Saturday, September 17, 2005 2:35 PM > *To:* Derby Discussion > *Subject:* Re: derby performance and 'order by' > > > > Hi Scott, > > > > How have you set up the test? Are you using ij and displaying all of the > data or using jdbc to access the data? > > > > What do you do in 0.010 seconds? Do you read all of the rows into > memory, or just record the time until you get the first row? Are you > measuring the time taken to return all the rows or just the first row? > > Another reader has already commented on the fact that the second query > is doing a lot more work than the first. The second query must sort the > results after filtering the data, whereas the first and third queries > can simply use the indexes and filter on the fly. > > > > I'm a little suspicious of the third query returning 720,000 results in > 0.010 seconds. > > > > Craig > > > > On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote: > > > > I have observed some interesting query performance behavior and am > hoping someone here can explain. > > > > In my scenario, it appears that an existing index is not being used for > the �order by� part of the operation and as a result the performance of > certain queries is suffering. Can someone explain if this is supposed > to be what is happening and why? Please see below for the specific > queries and their performance characteristics. > > > > > > > > Here are the particulars: > > --------------------------------- > > > > > > create table orders( > > order_id varchar(50) NOT NULL > > CONSTRAINT ORDERS_PK PRIMARY KEY, > > amount numeric(31,2), > > time date, > > inv_num varchar(50), > > line_num varchar(50), > > phone varchar(50), > > prod_num varchar(50)); > > > > > > --Load a large amount of data (720,000 records) into the �orders� table > > > > > > --Create an index on the time column as that will be used in the �where� > clause. > > > > create index IX_ORDERS_TIME on orders(time); > > > > > > --When I run a query against this table returning top 1,000 records, > this query returns very quickly, consistently less than .010 seconds. > >> >> >> >> >> select * from orders >> >> where time > '10/01/2002' and time < '11/30/2002' >> >> order by time; >> >> >> >> >> >> --Now run a similarly query against same table, returning the top >> 1,000 records. >> >> --The difference is that the results are now sorted by the primary key >> (�order_id�) rather than �time�. >> >> --This query returns slowly, approximately 15 seconds. Why?? >> >> >> >> >> >> select * from orders >> >> where time > '10/01/2002' and time < '11/30/2002' >> >> order by order_id; >> >> >> >> >> >> --Now run a third query against the same �orders� table, removing the >> where clause >> >> --This query returns quickly, around .010 seconds. >> >> >> >> select * from orders >> >> order by order_id; >> >> >> >> --------------------------------------------- >> >> >> >> >> >> >> >> >> >> >> > > > Craig Russell > > Architect, Sun Java Enterprise System http://java.sun.com/products/jdo > > 408 276-5638 mailto:Craig.Russell@sun.com > > P.S. A good JDO? O, Gasp! > > > --0-1216346291-1127236789=:81937 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 8bit
Another little detail about optimization is that Statement.setMaxRows() kind of functions on the JDBC side may not be sufficient since it is called after SQL statement is prepared and returned as an object (after query plan is built). Therefore, it may be necessary to have language syntax to indicate the intention to fetch first 1000 rows only, so that when the query is prepared, this intention can be taken into account.
Regards,
Ali


Mike Matrigali <mikem_app@sbcglobal.net> wrote:
As craig points out it is important in performance testing to say
exactly what you are measuring. In general Derby will try to
stream rows to the user before it has finished looking at all rows.
So often looking at the first row will and stopping will mean that
many rows have not been processed. BUT when an order by is involved
and the query plan either has no appropriate matching index, or decides
to use a different index then all the rows are processed, then they are
sent to the sorter and finally after all rows are processed they are
streamed to the client.

So as you have seen reading the first 1000 rows of a much larger data
set can happen very quickly.

As subsequent mail threads have pointed out, returning the top 1000
sorted rows is an interesting problem which could be costed and executed
differently if that information was pushed into the optimizer and the
sorter (and medium level projects were done in those areas).


scotto wrote:
> The test was set up and run using the SQuirreL client, not ij. All 3 of
> the queries return the top 1000 rows and the times I reported are to
> return these top 1000 rows, not just the first row.
>
>
>
> ------------------------------------------------------------------------
>
> *From:* Craig Russell [mailto:Craig.Russell@Sun.COM]
> *Sent:* Saturday, September 17, 2005 2:35 PM
> *To:* Derby Discussion
> *Subject:* Re: derby performance and 'order by'
>
>
>
> Hi Scott,
>
>
>
> How have you set up the test? Are you using ij and displaying all of the
> data or using jdbc to access the data?
>
>
>
> What do you do in 0.010 seconds? Do you read all of the rows into
> memory, or just record the ti me until you get the first row? Are you
> measuring the time taken to return all the rows or just the first row?
>
> Another reader has already commented on the fact that the second query
> is doing a lot more work than the first. The second query must sort the
> results after filtering the data, whereas the first and third queries
> can simply use the indexes and filter on the fly.
>
>
>
> I'm a little suspicious of the third query returning 720,000 results in
> 0.010 seconds.
>
>
>
> Craig
>
>
>
> On Sep 16, 2005, at 4:42 PM, Scott Ogden wrote:
>
>
>
> I have observed some interesting query performance behavior and am
> hoping someone here can explain.
>
>
>
> In my scenario, it appears that an existing index is not being used for
> the �order by� part of the operation and as a result the perfo rmance of
> certain queries is suffering. Can someone explain if this is supposed
> to be what is happening and why? Please see below for the specific
> queries and their performance characteristics.
>
>
>
>
>
>
>
> Here are the particulars:
>
> ---------------------------------
>
>
>
>
>
> create table orders(
>
> order_id varchar(50) NOT NULL
>
> CONSTRAINT ORDERS_PK PRIMARY KEY,
>
> amount numeric(31,2),
>
> time date,
>
> inv_num varchar(50),
>
> line_num varchar(50),
>
> phone varchar(50),
>
> prod_num varchar(50));
>
>
>
>
>
> --Load a large amount of data (720,000 records) into the �orders� table
>
>
>
>
>
> --Create an index on the time column as that will be used i n the �where�
> clause.
>
>
>
> create index IX_ORDERS_TIME on orders(time);
>
>
>
>
>
> --When I run a query against this table returning top 1,000 records,
> this query returns very quickly, consistently less than .010 seconds.
>
>>
>>
>>
>>
>> select * from orders
>>
>> where time > '10/01/2002' and time < '11/30/2002'
>>
>> order by time;
>>
>>
>>
>>
>>
>> --Now run a similarly query against same table, returning the top
>> 1,000 records.
>>
>> --The difference is that the results are now sorted by the primary key
>> (�order_id�) rather than �time�.
>>
>> --This query returns slowly, approximately 15 seconds. Why??
>>
>>
>>
>>
>>
>> select * from orders
>>
>> where time > '10/01/2002' and time < '11/30/2002'
>>
>> order by order_id;
>>
>>
>>
>>
>>
>> --Now run a third query against the same �orders� table, removing the
>> where clause
>>
>> --This query returns quickly, around .010 seconds.
>>
>>
>>
>> select * from orders
>>
>> order by order_id;
>>
>>
>>
>> ---------------------------------------------
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> Craig Russell
>
> Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
>
> 408 276-5638 mailto:Craig.Russell@sun.com
>
> P.S. A good JDO? O, Gasp!
>
>
>
--0-1216346291-1127236789=:81937--