db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2499) Update Tuning Guide documentation to reflect the new static IN list transformation that occurs as a result of DERBY-47.
Date Wed, 11 Apr 2007 20:49:32 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488181
] 

A B commented on DERBY-2499:
----------------------------

I read through the sections highlighted by Laura and it looks like there is only one page
that needs to be updated to reflect the DERBY-47 changes.  The other pages either do not reference
IN lists or talk about optimizations/transformations that are not affected by DERBY-47.

The one page that needs to be updated is:

http://db.apache.org/derby/docs/dev/tuning/rtuntransform582.html

I think we should pretty much just delete the content of this page (except for the various
links to related topics) and rewrite the page as follows...

------- <begin-new-page> -------

Simple IN predicate transformations

A "simple" IN list predicate is one in which the left operand is a <simple column reference>
and the IN list is composed entirely of constants and/or parameter markers.  So all of the
following are simple IN predicates:

  ... where orig_airport IN ('ABQ', 'AKL', 'DSM')
  ... where orig_airport IN (?, ?, ?)
  ... where orig_airport IN ('ABQ', ?, ?, 'YYZ')

Derby will transform such IN list predicates into a single equality predicate whose right
operand is an internally created parameter marker.  This internal equality predicate is called
a "probe predicate".  So each of the above three IN predicates will all get transformed into
the following probe predicate:

  ... where orig_airport = ?

Note that probe predicates are treated differently than normal equality predicates.  So while
"orig_airport = ?" may look like an equality comparison to a single value specified by the
user, that is *not* the case.  Probe predicates undergo special treatment during optimization
and execution, as explained below.

During optimization Derby will look at the probe predicate to see if it is "useful" for limiting
the number of rows retrieved from disk.  In order for a probe predicate to be useful the following
must be true:

  R1. The table to which the left operand (i.e. the column reference) belongs must
     have at least one index defined such that the left operand is the *first*
     column in that index.

  R2. The estimated cost of an access path which uses the probe predicate and one
     of the corresponding indexes (R1) must be less than the estimated cost of any
     other access paths seen by the optimizer.

     Generally speaking this means that the number of values in the IN list is
     significantly less than the number of rows in the table to which the left
     operand belongs.

If both of these requirements are met then Derby will use the probe predicate at execution
time to "probe" the underlying index for values in the IN list. Or put differently, the right
operand of the probe predicate (the parameter) becomes a place-holder into which Derby will
plug the different values from the IN list, and then for each value it will read the matching
rows from the index.

As an example, assume the query to execute is:

  select flights.orig_airport, cities.city_name
  from flights, cities
    where flights.orig_airport IN ('ABQ', 'DSM', 'YYZ')
      and flights.orig_airport = cities.airport

Derby will internally transform this query into the following:

  select flights.orig_airport, cities.city_name
  from flights, cities
    where flights.orig_airport = ?
      and flights.orig_airport = cities.airport

In this transformed query "flights.orig_airport = ?" is an internal probe predicate.  

Now assume there is an index on the ORIG_AIRPORT column of FLIGHTS.  If it turns out that
the estimated cost of probing that index for the three values (ABQ, DSM, YYZ) is less than
the cost of accessing FLIGHTS in some other way, Derby will choose to perform execution-time
probing on the index.  This ensures that we only read the necessary rows from the FLIGHTS
table.

At a higher level Derby's index probing for IN lists is an internal way of evaluating the
transformed query multiple times, once for each value in the IN list.  So from a JDBC perspective
Derby is logically (but not actually) doing the following:

  PreparedStatement ps = conn.prepareStatement(
    "select flights.orig_airport, cities.city_name " +
    "from flights, cities " +
    "  where flights.orig_airport = ? " +
    "    and flights.orig_airport = cities.airport ");

  ps.setString(1, "ABQ");
  rs1 = ps.executeQuery();

  ps.setString(1, "DSM");
  rs2 = ps.executeQuery();

  ps.setString(1, "YYZ");
  rs3 = ps.executeQuery();

and then combining the three result sets (rs1, rs2, and rs3).  Another way to capture this
logic is with the following query:

  select flights.orig_airport, cities.city_name
  from flights, cities
    where flights.orig_airport = 'ABQ'
      and flights.orig_airport = cities.airport

  UNION ALL

  select flights.orig_airport, cities.city_name
  from flights, cities
    where flights.orig_airport = 'DSM'
      and flights.orig_airport = cities.airport

  UNION ALL

  select flights.orig_airport, cities.city_name
  from flights, cities
    where flights.orig_airport = 'YYZ'
      and flights.orig_airport = cities.airport

In each subquery the equality predicate limits the number of rows read from FLIGHTS, so we
avoid having to read unnecessary rows from disk. The larger the FLIGHTS table is, the more
time Derby will save by probing the index for the relatively few IN list values.

Note: Derby does *not* actually create the above UNION ALL query; that query just illustrates
the logical behavior of a Derby IN list "probe predicate".

With this approach we know that regardless of how large the base table is, Derby will only
have to probe the index a max of N times, where "N" is the size of the IN list. If N is significantly
less than the number of rows in the table, or is significantly less than the number of rows
between the minimum value and the maximum value in the IN list, this selective probing ensures
that Derby will not spend time reading unnecessary rows from disk.

If either of the two requirements (R1 or R2) above is not satisfied, Derby will discard the
internal probe predicate and will execute the query using the IN list predicate in its original
form.

-------- <end-new-page> --------

Note that "<simple column reference>" in the first paragraph should be linked to the
corresponding doc page, similar to what is currently done here:

  http://db.apache.org/derby/docs/dev/tuning/rtuntransform590.html

And of course, any pages which currently link to the old "Static IN predicate transformations"
page will have to be updated to point to the new page (i.e. their text should be changed to
"Simple IN predicate transformations").

As always, this is just a first attempt at capturing the relevant info.  I am open to suggestions
and improvements in wording or general clarity.  In particular I wonder: are the JDBC extract
and the UNION ALL query useful, or do they just make things more confusing...?

> Update Tuning Guide documentation to reflect the new static IN list transformation that
occurs as a result of DERBY-47.
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2499
>                 URL: https://issues.apache.org/jira/browse/DERBY-2499
>             Project: Derby
>          Issue Type: Sub-task
>          Components: Documentation
>    Affects Versions: 10.3.0.0
>            Reporter: A B
>         Assigned To: Laura Stewart
>            Priority: Minor
>
> DERBY-47 changed the static rewrite logic for IN lists.  Whereas we used to create a
BETWEEN predicate for IN-lists that contain all constants and then use that predicate to limit
the scan, we now create a "probe predicate" for IN-lists that contain all constants *and/or*
parameter nodes and then use that new predicate to perform execution-time probing.
> The documentation in the Tuning Guide needs to be updated to reflect this change in behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message