db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arindam Bhattacharjee <mr.arindam.bhattachar...@gmail.com>
Subject Re: URGENT!!! JDBC SQL query taking long time for large IN clause
Date Tue, 07 Apr 2009 15:11:48 GMT
Hello Bryan,

query: select object_id from object_master where object_id in (? ..1000
times.. ?)
time taken to prepare statement = 328
time taken execute statement = 31
time taken enumerate result set = 31813 (isn't this unusual? enumeration of
results taking so much time?)

To answer your question regarding result set count for the previous query:
1069

Best regards,

Arindam

On Tue, Apr 7, 2009 at 7:54 PM, Bryan Pendleton
<bpendleton@amberpoint.com>wrote:

> I would like my query below to return within 100 millisecs. Please help me,
>>
>
> Here's what Derby is doing with your query:
>
> 1) It scans the object_master_index looking for the object_id values in
> your IN list. You provided 1,000 values. Derby sorts your list, finds the
> lowest
> value, and the highest value, and scans that section of the index. During
> this
> processing, Derby looks at 104,896 entries in the index, and discards
> 103,896
> of them, resulting in 1,000 index entries. This requires touching 566 index
> pages,
> which I think are read somewhat randomly, and takes 30 seconds to process:
>
> >                         Rows seen = 104896
> >                         Rows filtered = 103896
> >                             next time (milliseconds) = 31769
> >
> >                         Source result set:
> >                             Index Scan ResultSet for OBJECT_MASTER using
> >                             Rows seen = 104896
> >                                 Number of pages visited=566
>
> Honestly, I'm not quite sure why this step takes 30 seconds. This is a
> puzzle to me.
>
> 2) Derby then takes those 1,000 rows, and joins them against
> object_category_mapping
> using the object_category_mapping index. This results in 19,000 rows in the
> result,
> which I guess means that each object belongs to 19 categories? This
> requires
> performing 1,000 probes into the index, which each touch about 4 pages on
> average:
>
> >                         Index Scan ResultSet for OBJECT_CATEGORY_MAPPING
> > using index OBJECT_CATEGORY_MAPPING_INDEX at read committed isolation
> level
> >                         Number of opens = 1000
> >                         Rows seen = 19000
> >                             Number of pages visited=4119
>
> 3) The 19,000 rows resulting from the join are then sorted and grouped by
> category_name, and the category_count is computed. This requires an
> external sort, and results in 1084 category_name groups:
>
> >         Grouped Aggregate ResultSet:
> >         Rows input = 19000
> >         Sort information:
> >             Number of merge runs=1
> >             Number of rows input=19000
> >             Number of rows output=1084
> >             Size of merge runs=[18220]
> >             Sort type=external
>
> 4) lastly, the result is sorted in order to return the results ordered by
> count.
> This is an small internal sort and is very fast.
>
> > Sort ResultSet:
> > Sort information:
> >     Number of rows input=1068
> >     Number of rows output=1068
> >     Sort type=internal
>
> I'm a little confused about why the grouping appears to produce 1,084 rows,
> but the order by only sees 1068. How many rows are in your final result?
>
> This query plan, overall, seems pretty good to me. It doesn't look like
> Derby
> did anything frightenly stupid.
>
> However, I am confused as to why the initial scan of the OBJECT_MASTER for
> the
> object_id values in the IN clause took 30 seconds.
>
> What happens if you shrink your query down, and just do:
>
>  select object_id from object_master where object_id in (?,...,?)
>
> and pass it the exact same 1,000 values for the IN clause. Does that query
> by itself also take 30 seconds?
>
> thanks,
>
> bryan
>
>
>

Mime
View raw message