Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 24594 invoked from network); 12 Jul 2006 17:40:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 12 Jul 2006 17:40:57 -0000 Received: (qmail 20765 invoked by uid 500); 12 Jul 2006 17:40:55 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 20740 invoked by uid 500); 12 Jul 2006 17:40:54 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 20729 invoked by uid 99); 12 Jul 2006 17:40:54 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2006 10:40:54 -0700 X-ASF-Spam-Status: No, hits=1.9 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [32.97.182.145] (HELO e5.ny.us.ibm.com) (32.97.182.145) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jul 2006 10:40:52 -0700 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id k6CHeT1H022602 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Wed, 12 Jul 2006 13:40:30 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.13.6/NCO/VER7.0) with ESMTP id k6CHeTo2289482 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 12 Jul 2006 13:40:29 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id k6CHeTG6022803 for ; Wed, 12 Jul 2006 13:40:29 -0400 Received: from [127.0.0.1] (sig-9-65-0-232.mts.ibm.com [9.65.0.232]) by d01av03.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id k6CHeQxq022606 for ; Wed, 12 Jul 2006 13:40:28 -0400 Message-ID: <44B53409.3070901@sbcglobal.net> Date: Wed, 12 Jul 2006 10:40:25 -0700 From: Mike Matrigali Reply-To: mikem_app@sbcglobal.net User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Derby Discussion Subject: Re: Strange performance issue with BLOB's References: <000301c6a576$77231670$717ba8c0@DB35TT0J> In-Reply-To: <000301c6a576$77231670$717ba8c0@DB35TT0J> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I think the following is what is going on. The blobs are not really instantiated in memory in one sense, but the entire container is read through the cache a page at a time. The base container of a derby table is a sequence of pages. Derby stores blob data in the same container as the rest of the data, in a page link chain. There is a map of pages used only for space allocation, not page types. So in the scan case Derby visits every page in the container and then checks whether each page is a "main" page or not. Main pages are returned up the stack to access to process each row. In the case of your query no blob data is actually accessed so "access" never requests the column data for the blob data - but the whole table is still read through the cache. There is no support in the base container implementation to "skip" to the next main page - that is what indexes provide. This is an interesting case for the optimizer. My guess is that the optimizer never considers using an index if there is no where clause unless the index "covers" (includes all the columns of the query). I am not sure how hard it would be to get the optimizer to consider using the index in this case, and I am not sure if the existing costing handles this blob case - but from your experiment is sort of looks like the costing is right as the optimizer is choosing the index when you added the dummy where - I am sure the costing of that where indicated that it would scan the entire index. With performance issues like this a lot is made much more obvious if you can include the query plan of the 2 queries. Piet Blok wrote: > Hi all, > > When experimenting with BLOB's I ran into a performance issue > that I cannot completely explain, but it could be a bug. > > Given the following table: > > CREATE TABLE BLOB_TABLE ( > BLOB_ID BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, > INCREMENT BY 1), > BLOB_SIZE BIGINT NOT NULL, > BLOB_CRC BIGINT NOT NULL, > BLOB_NAME VARCHAR(255) NOT NULL, > BLOB_DATA BLOB(2G) NOT NULL, > UNIQUE (BLOB_CRC, BLOB_SIZE), > PRIMARY KEY (BLOB_ID) > ); > > which is populated with 27 rows, > where the sum of all BLOB sizes is 5,885,060,164 bytes > (about 200 MB average per BLOB, but ranging from 10 MB to 750 MB). > > Some queries on this table are executed really > fast (almost instantaneous response). > > However, the following query needs about 10 minutes to complete: > > SELECT BLOB_ID, BLOB_NAME, BLOB_SIZE, BLOB_CRC FROM BLOB_TABLE; > > I reasoned that maybe Derby is scanning the whole table > (including the blob contents) so I tried to add a dummy WHERE > clause (dummy because all BLOB_ID's are greater than 0) > to offer a clue as to what rows (all of course) are needed, > as follows > > SELECT BLOB_ID, BLOB_NAME, BLOB_SIZE, BLOB_CRC FROM BLOB_TABLE WHERE > BLOB_ID > 0; > > and that helped: instantaneous response. > > But I really think that the original query, > without the where clause, should not be any slower. > > > I am using Derby 10.1.3.1 embedded, Windows XP and Sun Java 1.5.0_06. > Both queries executed with a Statement, not a PreparedStatement. > > Kind regards, > > Piet Blok > > > >