Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 53532 invoked from network); 15 Jan 2009 08:41:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jan 2009 08:41:46 -0000 Received: (qmail 23713 invoked by uid 500); 15 Jan 2009 08:41:45 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 23680 invoked by uid 500); 15 Jan 2009 08:41:45 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 23671 invoked by uid 99); 15 Jan 2009 08:41:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 00:41:45 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [130.225.24.68] (HELO sbexch03.sb.statsbiblioteket.dk) (130.225.24.68) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 08:41:35 +0000 Received: from [130.225.25.44] (130.225.25.44) by sbexch03.sb.statsbiblioteket.dk (130.225.24.68) with Microsoft SMTP Server id 8.1.336.0; Thu, 15 Jan 2009 09:41:14 +0100 Subject: RE: Bad table design leads to bad performance...RE: Bad performance with UPDATE on a nested SELECT From: Mikkel Kamstrup Erlandsen To: "msegel@segel.com" CC: 'Derby Discussion' In-Reply-To: <20090114141747.B25FB605EF@dbrack01.segel.com> References: <20090114141747.B25FB605EF@dbrack01.segel.com> Content-Type: text/plain Date: Thu, 15 Jan 2009 09:41:12 +0100 Message-ID: <1232008872.6328.33.camel@pc280> MIME-Version: 1.0 X-Mailer: Evolution 2.24.2 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Wed, 2009-01-14 at 14:15 +0100, derby@segel.com wrote: > > > -----Original Message----- > > From: Mikkel Kamstrup Erlandsen [mailto:mke@statsbiblioteket.dk] > > Sent: Wednesday, January 14, 2009 12:12 AM > > To: Derby Discussion > > Cc: msegel@segel.com > > Subject: RE: Bad table design leads to bad performance...RE: Bad > > performance with UPDATE on a nested SELECT > [SNIP] > > > > > > As I said above, I just tried out your strategy. Using only integer > > > handles the query runs in about 4s. I still need a factor 100 better > > > than that... > > > > Sorry, forgot to add that this was on a base with ~700k rows only, not > > the full 1,5M rows one... > > > > Cheers, > > Mikkel > > Mikkel, > > Sorry if I'm being a bit slow... > > That 4 seconds was on 700K rows using integers instead of varchars? > Does that also include the optimizer hint or was that without the optimizer > hint? Yes. And it was without the optimizer hint. > The reason I ask is that if your query was the same and you just switched > from varchar to integers, then there clearly is an issue with varchars and > the optimizer. Yeah, the query was the exact same, just with INTEGER instead of VARCHAR(255) for the id column. For reference, the query we are talking about: UPDATE records SET base='my_base' WHERE id IN ( SELECT parentId FROM relations WHERE childId='id_1'); I am not sure that the issue is about varchars. As I stated in an earlier mail I had a running time of 5-10s on the 700k row base when I was using varchars. This means that an identity index on an integer column gives a small speed up (maybe 30% if I am to pull a random number out of my hat). What I was hoping for was more like 3000%... > If I understood you correctly, it sounded like when you tried the optimizer > hint, using varchars, you got the same result as the integers. Is that the > case? More or less... The integer case might be slightly faster, but I don't have precise timings to back this up. In any case the difference here is very small. Generally I would say that the optimizer hint doesn't provide any real difference. > If you're going to loop through the result set, you may find it faster to > delete the old row and then perform the insert. Updates are hard(er) on a > database. I am just doing a manual loop over the SELECT result set now (it usually contains 0-5 rows) and doing an update for each row. It works at an acceptable speed. < 1ms for each op. > Derby isn't Postgres and there are some neat things you can do in C that you > can't do in Java that could give a C based database an edge. (Pointers can > be your friend. ;-) At the same time, there are design decisions that could > have been made which are hampering performance now. And C doesn't have a JIT compiler :-) Java programs can be blazingly fast. Lucene impresses me on a daily basis :-) I don't believe this issue is about Java. At this point I am quite certain that it is a problem with the query optimizer in Derby. Cheers, Mikkel