db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
Date Mon, 09 Mar 2009 17:45:50 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mamta A. Satoor updated DERBY-3788:
-----------------------------------

    Attachment: DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt
                DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt

I have an intermediate patch ready for review. The goal of the patch is to detect early on
in the compile phase of a SELECT query if there are tables involved with missing statistics.
If yes, then abandon the SELECT query compilation, create the missing statistics and then
try to compile the original SELECT query again. This logic gets driven inside the GenericStatement.prepMinion
code. GenericStatement.prepMinion is responsible for compile phase of a SQL. If the SQL involved
is a SELECT statement, then we may go through the compile process of the SELECT query twice.
If the  SQL involved is non-SELECT, we will finish SQL compilation in the first pass. The
details about the code flow for SELECT statement compilation is as follows.

For a SELECT statement, we may have to go through the query compilation phase twice. Whether
we go through one cycle or 2 cycles of compilation phase depends on wheter there are any tables
involved in the query who have their statistics missing. During the first attempt of query
compile, we will make a list of all the tables involved in the query during the bind phase
of compile. After the bind phase, during the first pass through the SELECT query compilation,
we will check if the query involved has any tables whose statistics are missing. If yes, then
we will quit from the SELECT compile phase by throwing missing statistics exception. We will
handle the missing stats exception by trying to update the missing statistics. After that,
we will go through the second pass of original query compilation. During this second pass,
we will not worry if the statistics are missing or not. We will just work with whatever statistics
are available (one example where the statistics may still be missing after trying to create
them is say a user who only has select privileges on a table finds the missing stats. But
the statistics creation will fail because the user does not have sufficient privileges to
create the statistics. There can be other reasons for statistics creation failure too. Another
instance would be that we can't get the necessary locks to update the stats. In such a case,
we will just quit out of updating the stats and move on to the second pass of the original
query compilation. 

One of the issues with this patch I need to work on and will appreicate if anyone has any
feedback on. The update statistics is happening in the user transaction. What that means is
that any locks required to update statistics will stay in place until the user transaction
has been committed/rolled back. This behavior can be an issue with user applications (since
these are the locks the user was not expecting to get as part of a SELECT query execution).
This behavior definitely causes problems with our tests where quite a handful of tests run
into locking issues because of the additional locks acquired before of a SELECT query. I think
one way to fix this would be to somehow run the statistics in a nested transaction which can
be committed after the statistics are created. If we run into locking issues in the nested
transaction then go ahead and run the statistic in the user transaction. This is what we do
for GENERATED columns in InsertResultSet.getSetAutoincrementValue. One of the problems that
I need to address before I can use nested transaction is to change the code that is written
to fire statistics during SELECT compilation. Currently, in my patch, I am executing the stored
procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS which internally executes ALTER TABLE.. to update
the statistics. I am at too high a level to be able to use nested transaction for updating
the statistics (let me know if I am wrong here. May be there is a way to use nested transaction
while firing the SYSCS_UTIL.SYSCS_UPDATE_STATISTICS that I am unaware of). I think I need
to get to update statistics code in AlterTableConstantAction directly somehow without going
through the stored procedure->ALTER TABLE.. sql path. I think if I can directly call the
update statistics in AlterTableConstantAction, then I can try using the nested yransaction
and if that does not succeed then user the user transaction.

I will appreciate any input you might have on my overall approach to this jira entry and then
how to address the locking issue.

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, DERBY3788_patch2_diff.txt,
DERBY3788_patch2_stat.txt, DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt,
DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new system stored
procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good for Derby to provide an automatic
way of updating the statistics without requiring to run the stored procedure manually. There
was some discussion on DERBY-269 about providing the 0-admin way. I have copied it here for
reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable to the
manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin solution? It
is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when exact statistics
were calculated. This number is 0 if none have been calculated because index creation happened
on an empty table. At query compile time when we look up statistics we automatically recalculate
the statistics at certain threshholds - say something like row count growing past next threshhold
: 10, 100, 1000, 100000 - with upper limit being somewhere around how many rows we can process
in some small amount of time - like 1 second on a modern laptop. If we are worried about response
time, maybe we background queue the stat gathering rather than waiting with maybe some quick
load if no stat has ever been gathered. The background gathering could be optimized to not
interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy to support
customers and debug issues in the field. There is proably always some dynamic data distribution
change that in some case won't be picked up by the automatic algorithm. Also just very useful
for those who have complete control of the create ddl, load data, run stats, deliver application
process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message