Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 65224 invoked from network); 30 Jul 2008 20:20:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Jul 2008 20:20:22 -0000 Received: (qmail 25054 invoked by uid 500); 30 Jul 2008 20:20:21 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 25018 invoked by uid 500); 30 Jul 2008 20:20:21 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 25007 invoked by uid 99); 30 Jul 2008 20:20:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jul 2008 13:20:21 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jul 2008 20:19:35 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AF953234C181 for ; Wed, 30 Jul 2008 13:19:31 -0700 (PDT) Message-ID: <1588113168.1217449171717.JavaMail.jira@brutus> Date: Wed, 30 Jul 2008 13:19:31 -0700 (PDT) From: "Mamta A. Satoor (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index In-Reply-To: <1231890337.1216665811829.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618494#action_12618494 ] Mamta A. Satoor commented on DERBY-3788: ---------------------------------------- Currently, iapi.sql.dictionary.TableDescriptor has a method called statisticsExist which checks for the passed conglomerate if there is statistics availble or not. This method gets called 1)during the optimization phase of a query by impl.sql.compile.FromBaseTable.estimateCost 2)during the optimization phase of a query by impl.sql.compile.PredicateList.selectivity 3)by impl.sql.execute.AlterTableConstantAction to determine if the statistics already exist and if yes, then drop and recreate it. This happens during the compress table request(not sure if anything other than compress table in AlterTableConstantAction calls it but we do not need to worry about it because we only want to create the statistics if they don't already exist during the compile phase of a sql query) What I am considering doing is adding another method to TableDescriptor called say createStatistics which will be called during the optimization phase of a query by the 2 classes impl.sql.compile.FromBaseTable and impl.sql.compile.PredicateList. The only awkward thing I am finding is the code required to update the statistics need many objects like LanguageConnectionContext, DataDictionary, TransactionController, etc (there may be more) and they are not available to TableDescriptor class so I will have to pass these objects when a call is made to the new method in TableDescriptor from impl.sql.compile. Does this sound like not a smooth way of getting the objects? Maybe this new method should be defined in DataDictionary rather than TableDescriptor. I will work more on what is the right place for the new method. In the mean time, if anyone has any thoughts, please post them here. Once we have this new method to create the statistics, we can hopefully remove the redudant code that already exists to create the statistics in 3 different classes, namely, AlterTableConstantAction, CreateIndexConstantAction and InsetResultSet. > Provide a zero-admin way of updating the statisitcs of an index > --------------------------------------------------------------- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance > Affects Versions: 10.5.0.0 > Reporter: Mamta A. Satoor > Assignee: Mamta A. Satoor > > DERBY-269 provided a manual way of updating the statistics using the new system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good for Derby to provide an automatic way of updating the statistics without requiring to run the stored procedure manually. There was some discussion on DERBY-269 about providing the 0-admin way. I have copied it here for reference. > ********************* > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable to the manual 'update statistics' > ********************* > ********************* > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when exact statistics were calculated. This number is 0 if none have been calculated because index creation happened on an empty table. At query compile time when we look up statistics we automatically recalculate the statistics at certain threshholds - say something like row count growing past next threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere around how many rows we can process in some small amount of time - like 1 second on a modern laptop. If we are worried about response time, maybe we background queue the stat gathering rather than waiting with maybe some quick load if no stat has ever been gathered. The background gathering could be optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy to support customers and debug issues in the field. There is proably always some dynamic data distribution change that in some case won't be picked up by the automatic algorithm. Also just very useful for those who have complete control of the create ddl, load data, run stats, deliver application process. > ********************* -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.