db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Stack overflow from materialized subquery
Date Thu, 20 Oct 2005 19:28:28 GMT
Hi Jeff,

It looks like you have identified a number of serious defects here. Your 
analysis seems solid to me and I doubt anyone is going to argue with 
you. Could you log a bug so we can fix these issues hopefully over the 
coming months?


Jeffrey Lichtman wrote:

> I have figured out what's behind the stack overflow reported by Daniel 
> Skiles in the derby-user list. To summarize, he gets a stack overflow 
> from this query:
>     SELECT path from filsystemfiles where
>         path not in (select path from existingfiles)
> The stack trace shows a large number of UnionResultSets calling 
> openCore(). The odd thing, of course, is that the query doesn't 
> contain any unions.
> It turns out that, at some point, an optimization was added to 
> subquery processing to materialize the subquery result in memory in 
> certain cases. According to the comments, the implementor of this 
> change was Tingjian Ge. The comments also refer to "beetle 4373", 
> which I assume is a cute way of referring to a bug number.
> The in-memory materialization works by reading the rows into memory 
> and constructing a set of nested UnionResultSets on the fly. This 
> happens in BaseActivation.java in the language execution code, in the 
> method materializeResultSetIfPossible(). This code tries to ensure 
> that the rows will fit in memory, but it doesn't take the possibility 
> of a stack overflow as a result of nested unions into account.
> The obvious solution is to get rid of the nested union technique in 
> materializeResultSetIfPossible(), and to replace it with an iterative 
> technique (as opposed to a recursive one). This should be pretty easy, 
> as the method already puts the rows into a Vector before creating the 
> unions.
> I should say, however, that some aspects of this optimization bother 
> me. The biggest problem is that it could make some queries run much 
> slower. Consider the following:
>     select * from one_row_table where column1 not in
>         (select column2 from million_row_table)
> The "unoptimized" query will read the single row from one_row_table 
> and do a probe into million_row_table to see if it finds a match. If 
> there is an index on million_row_table.column2, it will return a 
> result very quickly despite the large size of million_row_table.
> With the materialization "optimization" it will read all of 
> million_row_table into memory, even if there's an index on column2. 
> This is obviously a big performance hit.
> Since subquery materialization of this type can be expensive, the 
> decision as to whether to do it should be based on query optimizer 
> cost estimates. The optimizer knows what the cost is of materializing 
> a subquery, and also what the cost is of getting a row from an 
> unmaterialized subquery.
> I also wonder about the wisdom of having the language cache partial 
> query results in memory. Although this can help performance in some 
> cases, it also chews up memory. This is different from a limited-size 
> cache with a backing store (like what the store uses for page 
> caching). The language has no way to limit the total amount of memory 
> used in this type of processing.
> I also have some problems with the way this optimization was 
> implemented. The decision to materialize the subquery results in 
> memory is made during code generation - all such decisions should be 
> made during the compilation phase. It's not clear to me why 
> materializeResultSetIfPossible() is in BaseActivation - I would expect 
> the of materialization to be done by a type of ResultSet, not by a 
> method in BaseActivation. Also, this method calls 
> getMaxMemoryPerTable() in the OptimizerFactory - nothing in the 
> execution code should refer to anything in the compilation code 
> (perhaps getMaxMemoryPerTable() should be moved somewhere else).
> Thoughts, anyone?
>                        -        Jeff Lichtman
>                                 swazoo@rcn.com
>                                 Check out Swazoo Koolak's Web Jukebox at
>                                 http://swazoo.com/

View raw message