db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
Date Fri, 06 Feb 2009 22:01:00 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671349#action_12671349

Mamta A. Satoor commented on DERBY-3788:

Currently, I am pursuing running the missing statistics inline in the compile phase of a SELECT
query. The compile phase code is driven by GenericStatement.prepMinion. 

My first attempt was to run the update statistics inline in the optimize phase of a SELECT
sql (because in the optimize phase of a query, we look for statistics for determining the
query plan for execution and that is when we know that the statistics are missing) but there
I ran into data dictionary being in read-only mode. In GenericStatement.prepMinion method,
at line 307, we mark the data dictionary(DD) as read-only for the duration of bind and optimize
phase and hence statistics can't be updated because DD has been marked readonly at this time.

At line 422, after the optimize phase, we mark the DD as done reading and hence DD is not
in read-only mode during the generate phase. Based on this, I tried running the statistics
inline in the generate phase of SELECT query to establish the fact that the statistics can
indeed be run in-;ine. I ran into only one coding issues DERBY-4048 (about AlterTableConstantAction
not using execute transaction for updating the statistics). After resolving DERBY-4048, I
am able to run the update statistics inline in the generate phase of a SELECT query for a
simple test case as shown below
create table t1 (c11 int, c12 char);
create index i1 on t1(c11);
insert into t1 values (1,'1'),(2,'2'),(3,'3');
-- no statistics will be found for t1.i1 because table was empty when index i1 was created
select * from sys.sysstatistics;
-- during the generate phase of following query, we will run update statistics inline for
select * from t1 where c11=1;
-- this time, there will be statistics for t1.i1
select * from sys.sysstatistics;

Note, that there were no locking issues/privilege issues involved in the simple test case
above. But when update statistics runs into some problem, the question is how do we just revert
back the update statistics work and let the original SELECT query proceed with execution.
When I ran junit, towards the very beginning, I saw lot of problems related to not having
enough privileges, context stack issue etc because of update statistics.

I think the right approach would be to start with the compile phase of SELECT query and during
optimization, when we find that the statistics are missing, then quit the compilation of SELECT
query there, run the update statistics and refire the compilation of the SELECT query. This
way, update statistics will not run into any locking issues with SELECT query. I am going
to try to spend some time working on this approach and see how it goes. Would appreicate if
anyone has any feedback.

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions:
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, DERBY3788_patch2_diff.txt,
DERBY3788_patch2_stat.txt, DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java,
> DERBY-269 provided a manual way of updating the statistics using the new system stored
procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good for Derby to provide an automatic
way of updating the statistics without requiring to run the stored procedure manually. There
was some discussion on DERBY-269 about providing the 0-admin way. I have copied it here for
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable to the
manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin solution? It
is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when exact statistics
were calculated. This number is 0 if none have been calculated because index creation happened
on an empty table. At query compile time when we look up statistics we automatically recalculate
the statistics at certain threshholds - say something like row count growing past next threshhold
: 10, 100, 1000, 100000 - with upper limit being somewhere around how many rows we can process
in some small amount of time - like 1 second on a modern laptop. If we are worried about response
time, maybe we background queue the stat gathering rather than waiting with maybe some quick
load if no stat has ever been gathered. The background gathering could be optimized to not
interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy to support
customers and debug issues in the field. There is proably always some dynamic data distribution
change that in some case won't be picked up by the automatic algorithm. Also just very useful
for those who have complete control of the create ddl, load data, run stats, deliver application
> *********************

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message