db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikkel Kamstrup Erlandsen <...@statsbiblioteket.dk>
Subject RE: Bad table design leads to bad performance...RE: Bad performance with UPDATE on a nested SELECT
Date Thu, 15 Jan 2009 08:41:12 GMT
On Wed, 2009-01-14 at 14:15 +0100, derby@segel.com wrote:
> 
> > -----Original Message-----
> > From: Mikkel Kamstrup Erlandsen [mailto:mke@statsbiblioteket.dk]
> > Sent: Wednesday, January 14, 2009 12:12 AM
> > To: Derby Discussion
> > Cc: msegel@segel.com
> > Subject: RE: Bad table design leads to bad performance...RE: Bad
> > performance with UPDATE on a nested SELECT
> [SNIP]
> > >
> > > As I said above, I just tried out your strategy. Using only integer
> > > handles the query runs in about 4s. I still need a factor 100 better
> > > than that...
> > 
> > Sorry, forgot to add that this was on a base with ~700k rows only, not
> > the full 1,5M rows one...
> > 
> > Cheers,
> > Mikkel
> 
> Mikkel,
> 
> Sorry if I'm being a bit slow...
> 
> That 4 seconds was on 700K rows using integers instead of varchars?
> Does that also include the optimizer hint or was that without the optimizer
> hint?

Yes. And it was without the optimizer hint.

> The reason I ask is that if your query was the same and you just switched
> from varchar to integers, then there clearly is an issue with varchars and
> the optimizer. 

Yeah, the query was the exact same, just with INTEGER instead of
VARCHAR(255) for the id column. For reference, the query we are talking
about:

UPDATE records
SET base='my_base'
WHERE id IN (
  SELECT parentId
  FROM relations
  WHERE childId='id_1');

I am not sure that the issue is about varchars. As I stated in an
earlier mail I had a running time of 5-10s on the 700k row base when I
was using varchars. This means that an identity index on an integer
column gives a small speed up (maybe 30% if I am to pull a random number
out of my hat). What I was hoping for was more like 3000%...

> If I understood you correctly, it sounded like when you tried the optimizer
> hint, using varchars, you got the same result as the integers. Is that the
> case?

More or less... The integer case might be slightly faster, but I don't
have precise timings to back this up. In any case the difference here is
very small.

Generally I would say that the optimizer hint doesn't provide any real
difference.

> If you're going to loop through the result set, you may find it faster to
> delete the old row and then perform the insert. Updates are hard(er) on a
> database.

I am just doing a manual loop over the SELECT result set now (it usually
contains 0-5 rows) and doing an update for each row. It works at an
acceptable speed. < 1ms for each op.

> Derby isn't Postgres and there are some neat things you can do in C that you
> can't do in Java that could give a C based database an edge. (Pointers can
> be your friend. ;-) At the same time, there are design decisions that could
> have been made which are hampering performance now. 

And C doesn't have a JIT compiler :-) Java programs can be blazingly
fast. Lucene impresses me on a daily basis :-)

I don't believe this issue is about Java. At this point I am quite
certain that it is a problem with the query optimizer in Derby.

Cheers,
Mikkel


Mime
View raw message