db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Pendleton <bpendle...@amberpoint.com>
Subject Re: URGENT!!! JDBC SQL query taking long time for large IN clause
Date Tue, 07 Apr 2009 14:24:55 GMT
> I would like my query below to return within 100 millisecs. Please help me,

Here's what Derby is doing with your query:

1) It scans the object_master_index looking for the object_id values in
your IN list. You provided 1,000 values. Derby sorts your list, finds the lowest
value, and the highest value, and scans that section of the index. During this
processing, Derby looks at 104,896 entries in the index, and discards 103,896
of them, resulting in 1,000 index entries. This requires touching 566 index pages,
which I think are read somewhat randomly, and takes 30 seconds to process:

 >                         Rows seen = 104896
 >                         Rows filtered = 103896
 >                             next time (milliseconds) = 31769
 >                         Source result set:
 >                             Index Scan ResultSet for OBJECT_MASTER using
 >                             Rows seen = 104896
 >                                 Number of pages visited=566

Honestly, I'm not quite sure why this step takes 30 seconds. This is a puzzle to me.

2) Derby then takes those 1,000 rows, and joins them against object_category_mapping
using the object_category_mapping index. This results in 19,000 rows in the result,
which I guess means that each object belongs to 19 categories? This requires
performing 1,000 probes into the index, which each touch about 4 pages on average:

 >                         Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
 > using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation level
 >                         Number of opens = 1000
 >                         Rows seen = 19000
 >                             Number of pages visited=4119

3) The 19,000 rows resulting from the join are then sorted and grouped by
category_name, and the category_count is computed. This requires an
external sort, and results in 1084 category_name groups:

 >         Grouped Aggregate ResultSet:
 >         Rows input = 19000
 >         Sort information:
 >             Number of merge runs=1
 >             Number of rows input=19000
 >             Number of rows output=1084
 >             Size of merge runs=[18220]
 >             Sort type=external

4) lastly, the result is sorted in order to return the results ordered by count.
This is an small internal sort and is very fast.

 > Sort ResultSet:
 > Sort information:
 >     Number of rows input=1068
 >     Number of rows output=1068
 >     Sort type=internal

I'm a little confused about why the grouping appears to produce 1,084 rows,
but the order by only sees 1068. How many rows are in your final result?

This query plan, overall, seems pretty good to me. It doesn't look like Derby
did anything frightenly stupid.

However, I am confused as to why the initial scan of the OBJECT_MASTER for the
object_id values in the IN clause took 30 seconds.

What happens if you shrink your query down, and just do:

   select object_id from object_master where object_id in (?,...,?)

and pass it the exact same 1,000 values for the IN clause. Does that query
by itself also take 30 seconds?



View raw message