db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satheesh Bandaram <sathe...@Sourcery.Org>
Subject Re: Stack overflow from materialized subquery
Date Thu, 20 Oct 2005 22:35:05 GMT
Thanks Jeff, great analysis... It seems to me it was originally intended
to cache a small number of subquery result rows, but the code didn't
seem to implement it that way. Implementing the "optimization" at
runtime was probably chosen to make sure only small number of rows are
returned by the subquery. Like you said, using nested unions seems like
a bad idea too.

Any ideas on how to unoptimize without loosing performance gain some
queries might be getting now? It seems the "optimization" was done for a
specific customer query that is supposedly improved performance by 100


Jeffrey Lichtman wrote:

> I have figured out what's behind the stack overflow reported by Daniel
> Skiles in the derby-user list. To summarize, he gets a stack overflow
> from this query:
>     SELECT path from filsystemfiles where
>         path not in (select path from existingfiles)
> The stack trace shows a large number of UnionResultSets calling
> openCore(). The odd thing, of course, is that the query doesn't
> contain any unions.
> It turns out that, at some point, an optimization was added to
> subquery processing to materialize the subquery result in memory in
> certain cases. According to the comments, the implementor of this
> change was Tingjian Ge. The comments also refer to "beetle 4373",
> which I assume is a cute way of referring to a bug number.
> The in-memory materialization works by reading the rows into memory
> and constructing a set of nested UnionResultSets on the fly. This
> happens in BaseActivation.java in the language execution code, in the
> method materializeResultSetIfPossible(). This code tries to ensure
> that the rows will fit in memory, but it doesn't take the possibility
> of a stack overflow as a result of nested unions into account.
> The obvious solution is to get rid of the nested union technique in
> materializeResultSetIfPossible(), and to replace it with an iterative
> technique (as opposed to a recursive one). This should be pretty easy,
> as the method already puts the rows into a Vector before creating the
> unions.
> I should say, however, that some aspects of this optimization bother
> me. The biggest problem is that it could make some queries run much
> slower. Consider the following:
>     select * from one_row_table where column1 not in
>         (select column2 from million_row_table)
> The "unoptimized" query will read the single row from one_row_table
> and do a probe into million_row_table to see if it finds a match. If
> there is an index on million_row_table.column2, it will return a
> result very quickly despite the large size of million_row_table.
> With the materialization "optimization" it will read all of
> million_row_table into memory, even if there's an index on column2.
> This is obviously a big performance hit.
> Since subquery materialization of this type can be expensive, the
> decision as to whether to do it should be based on query optimizer
> cost estimates. The optimizer knows what the cost is of materializing
> a subquery, and also what the cost is of getting a row from an
> unmaterialized subquery.
> I also wonder about the wisdom of having the language cache partial
> query results in memory. Although this can help performance in some
> cases, it also chews up memory. This is different from a limited-size
> cache with a backing store (like what the store uses for page
> caching). The language has no way to limit the total amount of memory
> used in this type of processing.
> I also have some problems with the way this optimization was
> implemented. The decision to materialize the subquery results in
> memory is made during code generation - all such decisions should be
> made during the compilation phase. It's not clear to me why
> materializeResultSetIfPossible() is in BaseActivation - I would expect
> the of materialization to be done by a type of ResultSet, not by a
> method in BaseActivation. Also, this method calls
> getMaxMemoryPerTable() in the OptimizerFactory - nothing in the
> execution code should refer to anything in the compilation code
> (perhaps getMaxMemoryPerTable() should be moved somewhere else).
> Thoughts, anyone?
>                        -        Jeff Lichtman
>                                 swazoo@rcn.com
>                                 Check out Swazoo Koolak's Web Jukebox at
>                                 http://swazoo.com/

View raw message