db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Stored Prepared Statements? -- Thoughts on exposing StorablePreparedStatements to user/external statements
Date Mon, 20 Apr 2009 22:48:38 GMT
I think any solution that requires less interaction from the client
is better and fits better with the zero admin goal of Derby.  I wonder
if it would be good to take the "store/retrieve" statement cache idea
one step further.

Rather than just persist the statement cache, instead create an indexed
table that acts as the backing store for the statement cache.  Seems 
like it should be sized bigger than the in memory statement cache and
the statement cache could then be changed to fault things from it when
it got a miss.

some problems:
1) probably don't want to store infinite plans, there are too many bad
    apps out there that for instance every insert is actually a different
    query.  So seems like it would be good to have a max size on the on
    disk cache.
2) maybe want to limit types of plans to look for.  Some options might 
be only select plans with N or greater tables in from list?  Or maybe
only plans that took longer than N ms. to compile (should figure out
what the overhead of doing a lookup and serializing in a plan from disk
3) It would be nice to not slow down existing compile of queries too 
much.  Again what is the cost of a lookup in the on disk table for a
4) Need to make sure validation of these plans work.  This was an area
    that caused a number of corrupt databases in cloudscape.  The basic 
problem was that we would use a stored plan incorrectly when it should
have been invalidated because some sort of ddl event happened since the
original compile.  Easiest way for this to lead to a corrupt db is that
an index was added, and the old compiled plan did not know about it and
thus when an insert happened the new index would not get an entry 
leading to a mismatch between new index and table.   This is not an 
issue for system stored plans that get invalidated every release.  There
must be a mechanism for invalidated the trigger ones, but I don't know 
if existing invalidation will work in all cases for non trigger use.

Knut Anders Hatlen wrote:
> Nick Puz <nick.derby@me.com> writes:
>>  Hi Knut,
>> Thanks for the response, I'll take a look into the cloudscape docs and
>> how execute is implemented. I'll also look more at the statement cache
>> to see about persisting it on shutdown and restoring on startup, so
>> that then it could be done globally for all statements (as an
>> option). Another alternative would be to let the user specify which to
>> persist as a comment in the sql (as is done with the optimizer
>> overrides to specify index usage, etc.), possibly as a different
>> runtime option. The advantage of the first (global) is that apps could
>> take advantage with no sql changes. Perhaps persist the N most
>> recently used statements...
> Hi Nick,
> I agree that no SQL changes required is an advantage. Applications could
> start taking advantage of it immediately with no changes (except perhaps
> setting a flag to enable the persisting of the statement cache), and no
> new, non-standard syntax is required.
> As to using comments, that would require changes in the SQL, but the SQL
> would still be portable since the comments would simply be ignored by
> other DMBSs.
> Adding the ability to store user-defined prepared statements with some
> special syntax is probably the easiest to implement, since most of the
> infrastructure is already in place, but then the applications would need
> to use unportable SQL. (Although we don't actually need to add new
> syntax. We already have EXECUTE STATEMENT, and we could just create a
> system procedure to store the statements, something like CALL
> T'). But even if the syntax is portable, the actual statement will still
> be tied to Derby.)
> So, not considering the complexity of the different implementations, a
> persistent statement cache and/or use of comments sounds more attractive
> to me.
> If we go for persisting the statement cache, I think it would be easier
> to just store the entire cache than to store the N most recently used,
> both because Derby's statement cache currently doesn't have a notion
> about "most recently used", and because the statement cache normally is
> so small (25 statement by default, I think) as I wouldn't think the
> extra complexity is worthwhile.
>> In terms of invalidating, a crude but simple way would be to have the
>> version of derby stored with the prepared statements and just to
>> discard them if the version numbers differ. The same mechanism coudl
>> be used as when deciding to recompile the system stored
>> statements. What do you think?
> Reusing as much as possible of the existing code would be good. So
> basically what needs to be done is
>   1) On shutdown, if persistent cache is enabled, store each of the
>   cached statements in SYS.SYSSTATEMENTS.
>   2) On startup, move the stored statements from SYS.SYSSTATEMENTS back
>   into the cache, and remove them from SYS.SYSSTATEMENTS.
>   3) On upgrade, Derby will automatically invalidate all compiled plans
>   for statements in SYS.SYSSTATEMENTS, so no need for any special
>   handling for the persisted cached statements (though, strictly
>   speaking it would be better to drop those statements than to fill the
>   cache with uncompiled statements).
>   4) On soft-upgrade, I think it might be a good idea to disable
>   persisting of the cache entirely, since we wouldn't want to end up
>   with SYS.SYSSTATEMENTS containing statements compiled for different
>   versions of Derby.
> I think (must check to be sure) that in the current code, Derby needs to
> know whether or not the statement should be a stored prepared statement
> or an ordinary statement when the statement is compiled, and that an
> entry is added in the SYS.SYSSTATEMENTS table before the statement is
> compiled. If this is done for all statements, even the ones that won't
> still be in the cache when the database shuts down, there will be an
> extra, unnecessary compile-time cost for those statements.
> This could be stated as an expected overhead caused by enabling the
> persistent cache, but it also makes a solution where the users
> explicitly say which statements they want to persist attractive. That
> again leaves us with the question of how to dispose of them, since we
> surely don't want to keep all the different variations over the same
> statement that different versions of an application have used. Removing
> such statements from SYS.SYSSTATEMENTS when they are evicted from the
> statement cache is one option, I would guess.
> Another perhaps better option if we want to store the entire statement
> cache, is to find a way to convert a non-storable statement to a
> storable statement when we decide that it should be stored. Don't know
> how difficult that would be.
>> Besides version changes the statements should probably be
>> removed/regenerated if the statistics used to determine the execution
>> plan change. How does derby manage the statistics, are they
>> automatically kept up to date or does the user have to do something
>> specific? Do you happen to know what happens with the stored trigger
>> actions, since they should have the same problem. I'll take a look at
>> the code to try to figure it out in the next few days but just curious
>> if you or another derby dev knows...
> There is an undocumented (doc request logged as DERBY-3134) property,
> derby.language.stalePlanCheckInterval, which tells how often the plan
> for a prepared statement should be checked. If the plan is found to be
> stale (not sure if it looks at cardinality statistics or if it just
> checks the estimated size of the tables involved) a recompile is
> initiated. If the statement is an SPS, the new plan is also written back
> to the system table.

View raw message