db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Lichtman (JIRA)" <derby-...@db.apache.org>
Subject [jira] Commented: (DERBY-634) Subquery materialization can cause stack overflow
Date Sat, 22 Oct 2005 22:01:05 GMT
    [ http://issues.apache.org/jira/browse/DERBY-634?page=comments#action_12332880 ] 

Jeff Lichtman commented on DERBY-634:

In response to Sateesh's question: it's hard to know how to fix this problem without causing
a performance regression without seeing the customer's original query.

One thing I find perplexing about this problem is that, by the time the compiler gets to code
generation (where the decision is made to materialize the subquery), the NOT IN subquery should
have been converted to a correlated subquery, and the optimization disallowed. That is:

   select  path from filesystemfiles where path not in
        (select path from existingfiles)

should have been converted to a query tree equivalent to something like this:

    select path from filesystemfiles where not exists
        (select * from existingfiles where existingfiles.path = filesystemfiles.path)

Thus the subquery should not be materialized in this case.

The rewritten query (as above) was suggested to the user who originally reported this problem.
The stack overflow didn't happen when he used this version of the query.

Another interesting thing is that there is other logic in SubqueryNode having to do with materializing
subqueries that seems to handle different cases than the one here.

> Subquery materialization can cause stack overflow
> -------------------------------------------------
>          Key: DERBY-634
>          URL: http://issues.apache.org/jira/browse/DERBY-634
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions:
>     Reporter: Jeff Lichtman

> A performance optimization in subquery processing can cause a stack overflow.
> The optimization materializes a subquery ResultSet in memory where it thinks the rows
will fit in memory. The materialization is done as a  set of  nested unions of constant rows
(UnionResultSets and RowResultSets). If there are a lot of rows this can cause a stack overflow
when fetching a row.
> The obvious fix is to make it use an iterative technique rather than a recursive one
for storing and returning the rows. See the method BaseActivation.materializeResultSetIfPossible()
in the language execution code.
> There are some other issues with this performance optimization that should be looked
> 1) The optimization can backfire, making the query run much slower. For example, in the
>     select * from one_row_table where column1 not in
>         (select column2 from million_row_table)
> reading million_row_table into memory is an expensive operation. If there is an index
on million_row_table.column2, the query should return a result very quickly despite the large
size of million_row_table by doing a single probe into million_row_table via the index.
> Since in-memory materialization can be an expensive operation, the decision about whether
to do it should be made based on query optimizer cost estimates. See SubqueryNode.generateExpression().
> 2) It may not be wise to cache partial query results in memory at all. Although this
can help performance in some cases, it also chews up memory. This is different from a limited-size
cache with a backing store (like what the store uses for page caching). The language has no
way to limit the total amount of memory used in this type of processing. Note that hash joins
originally used in-memory hash tables with no backing store, and that a backing store was
added later.
> 3) The implementation of this optimization has some problems. The decision to materialize
the subquery results in memory is made during code generation - all such decisions should
be made during the compilation phase. It's not clear to me why materializeResultSetIfPossible()
is in BaseActivation - I would expect the of materialization to be done by a type of ResultSet,
not by a method in BaseActivation. Also, this method calls getMaxMemoryPerTable() in the OptimizerFactory
- nothing in the execution code should refer to anything in the compilation code (perhaps
getMaxMemoryPerTable() should be moved somewhere else).

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message