db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
Date Tue, 18 Nov 2008 10:05:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648558#action_12648558
] 

Knut Anders Hatlen commented on DERBY-3788:
-------------------------------------------

Thanks for the updated patch, Mamta.

I'm afraid I cannot offer very much guidance, but I have some
questions and comments:

1) This patch addresses the problem with non-existing statistics, not
the problem with outdated statistics. In which situations don't the
statistics exist? If it doesn't happen very often, it might be fine to
update the statistics in the same thread.

2) Are the calls to EmbedConnection30.setupContextStack() and
restoreContextStack() needed around the call to execute()? I thought
execute() would call setup/restoreContextStack() itself.

3) Creating an EmbedConnection30 object directly breaks the
modularity. Unless the calls to the internal methods are necessary, it
may be better to use InternalDriver.activeDriver().connect(url, info)
instead.

4) I think that the creation of a new connection will reboot the
database if it has been shut down in the user thread, which may lead
to unpredictable behaviour. It also seems like it will preserve all
connection attributes, like attributes to reencrypt the database or to
start replication master.

5) There's a comment in DDImpl5.updateStatisticsInBackGround() saying
that "cm is null the very first time, and whenever we aren't actually
nested." I'm not sure I understand that comment. Why is it null the
first time? And isn't the method always called in a nested context?
And if it is null, wouldn't that cause a NullPointerException in
EmbedConnection's constructor when url=null is passed in?

6) DDImpl5.updateStatisticsInBackGround() updates the shared variable
executorForUpdateStatistics if it is null. But it is not protected by
synchronization, so race conditions are possible.

7) DDImpl5.stop() should call super.stop().

8) In BackgroundUpdateStatisticsTask, using a prepared statement with
the table name and the index name parametrized would be better because
it would handle quoting special characters correctly (not handled in
the current patch) and it would reduce the number of entries in the
statement cache.

9) As to the locking issues, I would have tried to call
Connection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED)
in the background thread to see if that solved/reduced the issues.

> Provide a zero-admin way of updating the statisitcs of an index
> ---------------------------------------------------------------
>
>                 Key: DERBY-3788
>                 URL: https://issues.apache.org/jira/browse/DERBY-3788
>             Project: Derby
>          Issue Type: New Feature
>          Components: Performance
>    Affects Versions: 10.5.0.0
>            Reporter: Mamta A. Satoor
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, DERBY3788_patch2_diff.txt,
DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new system stored
procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good for Derby to provide an automatic
way of updating the statistics without requiring to run the stored procedure manually. There
was some discussion on DERBY-269 about providing the 0-admin way. I have copied it here for
reference.
> *********************
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable to the
manual 'update statistics' 
> *********************
> *********************
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin solution? It
is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when exact statistics
were calculated. This number is 0 if none have been calculated because index creation happened
on an empty table. At query compile time when we look up statistics we automatically recalculate
the statistics at certain threshholds - say something like row count growing past next threshhold
: 10, 100, 1000, 100000 - with upper limit being somewhere around how many rows we can process
in some small amount of time - like 1 second on a modern laptop. If we are worried about response
time, maybe we background queue the stat gathering rather than waiting with maybe some quick
load if no stat has ever been gathered. The background gathering could be optimized to not
interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy to support
customers and debug issues in the field. There is proably always some dynamic data distribution
change that in some case won't be picked up by the automatic algorithm. Also just very useful
for those who have complete control of the create ddl, load data, run stats, deliver application
process. 
> *********************

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message