db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
Date Thu, 22 Jun 2017 01:16:00 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058566#comment-16058566
] 

Bryan Pendleton commented on DERBY-6940:
----------------------------------------

Regarding your observation about average row size and cost estimation, I thought it would
be worthwhile to note that there is some evidence that cost estimation problems result in
substantial impact to the Derby optimizer's query planning algorithms. For example, see: DERBY-1905,
DERBY-1260, DERBY-1205, DERBY-1259, and DERBY-1007.

I'm not suggesting that we should take action on any of those issues right away, just noting
that it's important to keep cost estimation behaviors in mind as we study the query optimizer's
behaviors.


> Enhance derby statistics for more accurate selectivity estimates.
> -----------------------------------------------------------------
>
>                 Key: DERBY-6940
>                 URL: https://issues.apache.org/jira/browse/DERBY-6940
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>         Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff
>
>
> Derby should collect extra statistics during index build time, statistics refresh time
which will help optimizer make more precise selectivity estimates and chose better execution
paths.
> We eventually want to utilize the new statistics to make better selectivity estimates
/ cost estimates that will help find the best query plan. Currently Derby keeps two type of
stats - the total row count and the number of unique values.
> We are initially extending the stats to include null count, the minimum value and maximum
value associated with each of the columns of an index. This would be useful in selectivity
estimates for operators such as [ IS NULL, <, <=, >, >= ] , all of which currently
rely on hardwired selectivity estimates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message