Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 87397 invoked from network); 26 Jul 2010 16:19:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Jul 2010 16:19:16 -0000 Received: (qmail 39370 invoked by uid 500); 26 Jul 2010 16:19:16 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 39328 invoked by uid 500); 26 Jul 2010 16:19:16 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 39321 invoked by uid 99); 26 Jul 2010 16:19:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jul 2010 16:19:15 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jul 2010 16:19:12 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o6QGIo29020252 for ; Mon, 26 Jul 2010 16:18:51 GMT Message-ID: <22090711.588211280161130372.JavaMail.jira@thor> Date: Mon, 26 Jul 2010 12:18:50 -0400 (EDT) From: "Mamta A. Satoor (JIRA)" To: derby-dev@db.apache.org Subject: [jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index In-Reply-To: <1231890337.1216665811829.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/DERBY-3788?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1289= 2343#action_12892343 ]=20 Mamta A. Satoor commented on DERBY-3788: ---------------------------------------- Just wanted to summarize what has been tried for this jira so far and a pro= posal for a possible solution worth trying 1)First I tried to see if the missing stats could be created inline when th= e SELECT query compilation finds that missing but that failed because SELEC= T query compilation marked the data dictionary in read only mode where as u= pdate statistics needs to update the data dictionary(this marking of read o= nly for data dictionary happens in GenericStatement.prepMinion). Because of= this read-only marking, we can't create the statistics inline of the compi= lation phase of the SELECT query.=20 2)Second approach was during the SELECT compile phase, if we come across mi= ssing stats, then schedule a task for later to update the stats. Finish the= SELECT compilation and execution after the update stat task is scheduled o= n the queue (the scheduling mechanism used is only available with jdk 1.5 a= nd higher). When the scheduled task was later fired, it would run no curren= t connection exception. The reason for that was that there was no context s= et for update statistics to run. Tried resolving this by creating a connect= ion context through following call InternalDriver.activeDriver().connect(ur= l, info) I had the SELECT compilation pass the needed information to the ba= ckground task so I can create the connection. This is pretty crude way of g= etting a connection to update the statistics. Along with the fact that the = code to create the connection context was not very modular, some of the exi= sting tests failed with this changes because of locking issues. This probab= ly is because the locks held by the background thread for update stats inte= rfered with locks required by the rest of the test.=20 3)The third approach I tried was when the SELECT query detects missing stat= s, abort the query compilation, (thus removing the data dictionary from rea= d-only mode), create the stats and then restart the original SELECT query c= ompilation and execution. The hope with this approach was that it will avoi= d running into locking issues. If we decide to pursue this approach more, w= e need to make sure that we look at the possibility of stats still missing = when the original query is compile the 2nd time. This can happen say becaus= e the user does not have privileges to update the stats, the update stat ra= n into locking issues. If we won't account for possibility of missing stats= during the 2nd time through the original query, we can end up in an infini= te loop. Another problem with this approach was that update stats were happ= ening in the same transaction as the original SELECT query. Which means tha= t the locks acquired by the update stat will not be released until the tran= saction is committed. Came across locking issues with this when I ran the e= xisting junit and derbyall tests. One way to fix this could be to start a n= ested user transaction and do the update stat in that transaction and commi= t that transaction(I think something like this is done for identity columns= . If yes, then we might find pointers there to use for update stats in nest= ed user transaction). Thus any locks acquired by the update stat work will = be released. And after that, go back to the original query again for compil= ation and execution.=20 4)With any approach we take, we should detect the case where the table is e= mpty and hence the stats might be missing. For such a case, we should not t= ry to fire stats creation task. to rephrase, during the compile phase of th= e SELECT query, when we look for statistics, first check if the table is em= pty and if yes then skip the code for collecting stats. From what I recall,= one can check if the table is empty by issuing open scan controller and as= king it for the number of rows in the table. If we do not check for empty t= able, then we will end up in infinite loop.=20 5)For read-only db we should detect that we are readonly and don't try to d= o update statistics=20 6)Also, we probably want to create statistics only for user tables? Haven't= thought enough about it.=20 7)What if the user executing the query is not the owner of the table. shoul= d statistics still be created? I think if the user does not have enough pri= vileges, then skip the step of creating stats since it's going to fail anyw= ays.=20 8)Once statistics are created, notify tabledescriptor about it because tabl= edescriptor cahces available statistics and it will never know of the new = statistics since tabledescriptor holds on to that cache.=20 9)Once we have update stats working, another improvement could be to identi= fy the cached compiled queries that may benefit from updated stats and have= those queries recompile when they are executed next time. In other words, = the relevant queries should be invalidated as part of the statistics collec= tion task.=20 10)Look at other scenarios(in addition to step 7) above) where we know that= update stats might fail. For those cases, don't spend time trying to creat= e the stats. Instead just let the optimizer work with the information it ha= s at the moment to come up with the query plan.=20 > Provide a zero-admin way of updating the statisitcs of an index > --------------------------------------------------------------- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: Improvement > Components: Store > Affects Versions: 10.5.1.1 > Reporter: Mamta A. Satoor > Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt= , DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY3788_patch3_di= ff.txt, DERBY3788_patch3_stat.txt, DERBY3788_update_all_missing_stats_after= _bind_patch4_diff.txt, DERBY3788_update_all_missing_stats_after_bind_patch4= _stat.txt, DERBY_3788_Mgr.java, DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new = system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good= for Derby to provide an automatic way of updating the statistics without r= equiring to run the stored procedure manually. There was some discussion on= DERBY-269 about providing the 0-admin way. I have copied it here for refer= ence. > ********************* > Kathey Marsden - 22/May/05 03:53 PM=20 > Some sort of zero admin solution for updating statistics would be preffer= able to the manual 'update statistics'=20 > ********************* > ********************* > Mike Matrigali - 11/Jun/08 12:37 PM=20 > I have not seen any other suggestions, how about the following zero admin= solution? It is not perfect - suggestions welcome.=20 > Along with the statistics storing, save how many rows were in the table w= hen exact statistics were calculated. This number is 0 if none have been ca= lculated because index creation happened on an empty table. At query compil= e time when we look up statistics we automatically recalculate the statisti= cs at certain threshholds - say something like row count growing past next = threshhold : 10, 100, 1000, 100000 - with upper limit being somewhere aroun= d how many rows we can process in some small amount of time - like 1 second= on a modern laptop. If we are worried about response time, maybe we backgr= ound queue the stat gathering rather than waiting with maybe some quick loa= d if no stat has ever been gathered. The background gathering could be opti= mized to not interfere with locks by using read uncommitted.=20 > I think it would be useful to also have the manual call just to make it e= asy to support customers and debug issues in the field. There is proably al= ways some dynamic data distribution change that in some case won't be picke= d up by the automatic algorithm. Also just very useful for those who have c= omplete control of the create ddl, load data, run stats, deliver applicatio= n process.=20 > ********************* --=20 This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.