db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-6011) Derby performs very badly (seems to deadlock and timeout) in very simple multi-threaded tests
Date Fri, 18 Jan 2013 10:58:15 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557133#comment-13557133
] 

Knut Anders Hatlen commented on DERBY-6011:
-------------------------------------------

Thanks for the feedback, Dag, Bryan and Mamta.

> For example, if there was a unique index on columns (A, B, C), and a non-unique
> index on columns (B, E), and the query specifies B=7 and C=9, is it obvious that
> we should then favor the unique index?

In that example, this particular adjustment of the cost is not performed. It is
currently done only if

1) There is a unique index on the table, and

2) The query contains equality predicates for all columns in the unique index, and

3) Estimated row count of the access path that's being explored (not the unique
index) has expected row count less than or equal to 1 before selectivity is
applied

The example query does not satisfy requirement (2), as it doesn't have an
equality predicate for column A. The patch only removes requirement (3), so it
shouldn't have any effect on such a query.

> In the past, when confronted with tables that had unexpected concurrency hotspots
> because they were very small, I have used the technique (in my application) of adding
> large dummy data columns at the end of the rows, thus artificially bloating up my
> data so that rows are pushed to separate pages and the optimizer is less likely to
> prefer scans and more likely to favor indexes.
>
> Would that application technique be beneficial here?

If the JOBQUEUE table has three or more rows at the time the query is compiled,
the unique index is chosen, so the problem can be worked around that way if the
application logic accepts it. It also looks like turning off cost-based
optimization (by setting the derby.optimizer.ruleBasedOptimization property to
true) makes it pick the unique index. But that may have other unwanted effects,
and I don't think the rule-based optimizer is a documented feature.

> Is it correct that all the queries which are *completly* covered by an unique
> index will now choose that unique index?

This particular optimization does not take into consideration whether or not
the index is covering. The unique index in the ManifoldCF test case is in fact
not a covering index.

I don't believe the patch will make all queries will pick a unique covering
index if one is available, for two reasons:

1) The optimization is only applied if there are equality predicates for all
index columns, so the patch won't make a multi-row scan on a unique index any
more likely than it currently is.

2) It's just an adjustment of the cost, not a guarantee that the unique index
will be picked. There could still be other factors that make the overall
estimated cost for the unique index higher than some other alternative.

> Without the patch, *only* queries with very limited data in the table would
> have picked a non-unique index rather than unique index?

Yes, that's right. The intention of the current code seems to be that those
queries should have been using the unique index too. Unfortunately, another
optimizer tweak (see my comment from 16/Jan/13 12:37) makes the table appear
not so empty, and the cost adjustment in favor of the unique index is not
applied.

> I guess what would help is little blurb about what kind of queries will be
> impacted by this change.

I'll see if I can come up with some kind of systematic description of the
changes we should expect.

Thanks.
                
> Derby performs very badly (seems to deadlock and timeout) in very simple multi-threaded
tests
> ---------------------------------------------------------------------------------------------
>
>                 Key: DERBY-6011
>                 URL: https://issues.apache.org/jira/browse/DERBY-6011
>             Project: Derby
>          Issue Type: Bug
>    Affects Versions: 10.7.1.1, 10.8.2.2, 10.9.1.0
>         Environment: Lenovo laptop with SSD's, Windows 7, 64-bit, Sun JDK 1.6.xx
>            Reporter: Karl Wright
>         Attachments: derby.log, force-specific-index.diff, manifoldcf.log, prefer-unique-index-v1.diff
>
>
> The Apache ManifoldCF project supports Derby as one of its underlying databases.  Simple
tests, however, demonstrate that Derby is apparently deadlocking and timing out repeatedly
under multi-thread conditions.  This problem is long-standing, and is not exhibited by any
other database ManifoldCF supports, and makes a simple test take between 6x and 12x as long.
> There is a trivial test with demonstrates the problem vs. other databases.  Please do
the following (once you have java 1.6+, svn 1.7+, and ant 1.7+ available):
> (1) Check out https://svn.apache.org/repos/asf/manifoldcf/trunk
> (2) Run the following ant target to download the dependencies: "ant make-core-deps"
> (3) Run the Derby test: "ant run-rss-tests-derby" . Note the time required - at least
180 seconds, can be up to 360 seconds.
> (4) Run the equivalent HSQLDB test: "ant run-rss-tests-HSQLDB".  This test takes about
31 seconds to run.
> The output of the Derby test can be found in the directory "tests/rss/test-derby-output".
 Have a look at manifoldcf.log, where all long-running queries are reported.  Derby.log is
also included, which shows only that during the test's cleanup phase the database is deleted
before it is shutdown, which is not pertinent to the performance issue.
> I am available to assist with ManifoldCF, if that seems to be required.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message