db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <Knut.Hat...@Sun.COM>
Subject Re: Nested query in conjunction with ordering - unreasonably slow
Date Mon, 04 Feb 2008 09:09:41 GMT
Daniel Noll <daniel@nuix.com> writes:

> Hi all.
>
> I've been trying to determine the cause for a slowness in a certain query.  
> I've managed to strip everything away and get to the core of what's making it 
> slow, but I'd like to understand why.
>
> The query is of this form:
>
>   SELECT jobs.id AS id,
>          jobs.name AS name,
>          (SELECT COUNT(1) FROM jobitems
>            WHERE jobitems.jobid = jobs.id) AS count
>     FROM jobs
>    ORDER BY jobs.name
>
> This takes around 3000ms to complete if there are 200,000 rows in jobitems.
>
> If I remove the "count" subquery, then it runs in 1ms.  If I remove the ORDER 
> BY, it runs in 1ms.
>
> There is only one row in jobs, thus I would expect the same performance 
> whether I order it or not.  The PK on jobitems is (jobid,itemid); prior posts 
> on this forum have led me to believe that this creates an implicit UNIQUE 
> INDEX on (jobid,itemid) and also an implicit INDEX on (jobid).  Whether this 
> is the case or not, creating an additional INDEX on (jobid) itself doesn't 
> speed things up.
>
> If I remove the count part and then execute it by itself, I get a total query 
> time of 3ms for the two queries.  No nested query should take longer than it 
> takes to execute its parts separately.
>
> I discovered this issue on version 10.3.1.4 but it still performs the
> same way under version 10.3.2.1.

It sounds like the optimizer is making a bad decision for some reason,
but I don't know why (perhaps the statistics are out of date? See
http://thread.gmane.org/gmane.comp.apache.db.derby.user/8100/focus=8101).
Since you seem to have a small reproducible test case for this problem,
it would be great if you could file a JIRA issue and upload one sql
script that creates the tables/indices and insert enough sample data to
show the problem, and one sql script that runs the different queries.

> The obvious workaround is not to order by name and sort the results on the 
> Java side.  But is there a way to rewrite this query that will make Derby 
> happier?

I don't know if it makes Derby happier, but I think this query does the
same thing without the nested query:

SELECT jobs.id, jobs.name, count(*) FROM jobs, jobitems
    WHERE jobs.id = jobitems.jobid
    GROUP BY jobs.id, jobs.name
    ORDER BY jobs.name

-- 
Knut Anders

Mime
View raw message