db-derby-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Db-derby Wiki] Update of "DerbyBug47" by JamesSynge
Date Sun, 08 Oct 2006 17:09:05 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification.

The following page has been changed by JamesSynge:
http://wiki.apache.org/db-derby/DerbyBug47

------------------------------------------------------------------------------
  ||Literals||110472||7500||1021||
  ||Markers||14||3.4||3075||
  
- For the Markers query, it would appear that the IN list is treated as a single predicate
to be applied to each row in the selected index (i.e. for each value of the column found in
the index, each value in the IN list is compared with that column value).
+ The predicate in the Markers query is transformed to effectively be:
+ 
+ {{{ WHERE (column >= lowestINListValue AND column <= greatestINListValue AND column
IN (?, ..., ?) }}}
+ 
+ This immediately implies that we're examining all values of the column between {{{lowestINListValue}}}
(known
+ as the start key) and {{{greatestINListValue}}} (the stop key), even though there is no
reason to believe
+ these values are closely clustered.
+ Furthermore, it implies that we aren't doing N probes of the BTree index (i.e. one for each
entry
+ in the IN list), but rather that we're scanning a potentially large range of the BTree,
applying
+ the IN predicate to each row in the selected range (i.e. if we've got 4 entries in the IN
list,
+ and 10,000 index rows to examine, we'd need to perform up to 40,000 comparisons, instead
of performing
+ 4 probes of the BTree).
+ 
+ Debugging through the execution of the Markers query, I was very surprised to discover
+ that the situation is worse than the prior paragraph states: After fetching an index row
+ in the range [start key, stop key], Derby fetches the base table row before executing
+ the predicate to filter out unneeded base table rows.  This means that in the example above,
+ we load 10,000 base table rows, when we only expect to get 4 * some small number of rows
returned.
+ '''Sigh.'''
  
  For the case evaluated in the test program, it would be better to treat the IN list rather
like the outer table of a nested loop join (as the results below demonstrate), where each
parameter in the IN list is used to "probe" the index.
  

Mime
View raw message