db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
Date Thu, 04 Dec 2008 17:54:44 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mamta A. Satoor updated DERBY-3788:

    Attachment: DERBY3788_patch3_stat.txt

I am attaching a new patch which incorporates following comments from Knut.
2)Are the calls to EmbedConnection30.setupContextStack() and restoreContextStack() needed
around the call to execute()? I thought execute() would call setup/restoreContextStack() itself.

I removed the calls to setup and restore the stack since as Knut pointed out, they are unnecessary.
3)Creating an EmbedConnection30 object directly breaks the modularity. Unless the calls to
the internal methods are necessary, it may be better to use InternalDriver.activeDriver().connect(url,
info) instead. 
I removed the direct creation of EmbedConnection30 and instead use InternalDriver.activeDriver().connect(url,
5)There's a comment in DDImpl5.updateStatisticsInBackGround() saying that "cm is null the
very first time, and whenever we aren't actually nested." I'm not sure I understand that comment.
Why is it null the first time? And isn't the method always called in a nested context? And
if it is null, wouldn't that cause a NullPointerException in EmbedConnection's constructor
when url=null is passed in? 
In the case that ContextManager may be null, I do not fire statistics creation in background
because that will cause NPE later on because url is null.
6) DDImpl5.updateStatisticsInBackGround() updates the shared variable executorForUpdateStatistics
if it is null. But it is not protected by synchronization, so race conditions are possible.

I now have synchronization code around the code that can result in race conditions.
7) DDImpl5.stop() should call super.stop(). 
Did this.
8) In BackgroundUpdateStatisticsTask, using a prepared statement with the table name and the
index name parametrized would be better because it would handle quoting special characters
correctly (not handled in 
the current patch) and it would reduce the number of entries in the statement cache. 
Now using PreparedStatement rather than Statement.

The 2 comments from Knut that are not addressed in this patch are as follows
4) I think that the creation of a new connection will reboot the database if it has been shut
down in the user thread, which may lead to unpredictable behaviour. It also seems like it
will preserve all connection attributes, like attributes to reencrypt the database or to start
replication master. 
I think there might be an issue as raised by Knut's comment 4). I still have to trace down
completely why some of the junit test failures with my patch say Database "Failed to start(/create)
database". It might be because of the properties that are used to create the new Connection
object to run the statistics in background. I am not sure yet if it happens during every run
of the test jdbcapi.DriverMgrAuthenticationTest but I have seen that test to be one of the
tests that fails with Failed to create database. I will work on trying to figure out at what
point does the test fail with these errors. If anyone has any ideas on the failures or how
we should create the new Connection object,
I will highly appreciate that.
9) As to the locking issues, I would have tried to call Connection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED)
in the background thread to see if that solved/reduced the issues. 
I am not 100% about this but I this using isolation Connection.TRANSACTION_READ_UNCOMMITTED
had slowed down test runs on my machine. I would like to address number 4 first so that at
least the test failures go down and then I can focus on usage of isolation level 
Connection.TRANSACTION_READ_UNCOMMITTED. There ARE few tests that fail because of lock time
out and may be using the isolation level recommended by Knut will help.

So, at this point, I will focus on 4) to try to narrow down where are the failures coming
from. In the mean time, if community has time to run the junit tests to see if they notice
noticeable performance problems with my patch, I will appreciate that feedback. Thanks

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions:
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, DERBY3788_patch2_diff.txt,
DERBY3788_patch2_stat.txt, DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java,
> DERBY-269 provided a manual way of updating the statistics using the new system stored
procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good for Derby to provide an automatic
way of updating the statistics without requiring to run the stored procedure manually. There
was some discussion on DERBY-269 about providing the 0-admin way. I have copied it here for
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable to the
manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin solution? It
is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when exact statistics
were calculated. This number is 0 if none have been calculated because index creation happened
on an empty table. At query compile time when we look up statistics we automatically recalculate
the statistics at certain threshholds - say something like row count growing past next threshhold
: 10, 100, 1000, 100000 - with upper limit being somewhere around how many rows we can process
in some small amount of time - like 1 second on a modern laptop. If we are worried about response
time, maybe we background queue the stat gathering rather than waiting with maybe some quick
load if no stat has ever been gathered. The background gathering could be optimized to not
interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy to support
customers and debug issues in the field. There is proably always some dynamic data distribution
change that in some case won't be picked up by the automatic algorithm. Also just very useful
for those who have complete control of the create ddl, load data, run stats, deliver application
> *********************

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message