db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harshvardhan Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-6942) Utilise additional statistics for selectivity estimates.
Date Wed, 28 Jun 2017 19:40:00 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067113#comment-16067113
] 

Harshvardhan Gupta commented on DERBY-6942:
-------------------------------------------

The estimateCost function in FromBaseTable.java is the guts of the cost and cardinality estimation
logic for base table predicates.

The function is called for each of conglomerate (i.e each of the access path possible on the
table such as indexes, full table scan). For predicates that can be utilized as start or stop
predicates for the given conglomerate (refer http://db.apache.org/derby/docs/10.13/tuning/ctunoptimz24840.html),
row estimates are obtained through the store by giving the actual constant keys specified
in the query, the estimate by row is quite accurate as it finds out the fraction of values
between the start and stop keys.

For all other remaining predicates one of the two things happen:

1) If an equality op is specified and statistics exist for the conglomerate (and it is not
a valid start/stop predicate), we query statistics for selectivity.(it takes into account
number of unique columns)

2) Hard wired selectivities are used for all other cases.


As part of DERBY-6940, we started collecting extra statistics such as null count, min and
max value for each of the columns of the index. We would like to utilise them to eliminate
usage of hard wired selectivities whenever possible.

For example - 

Let us say we have a table and an associated index created using the following statement -


create table abcd(id integer);
create index idx on abcd(id);

Now, suppose a query is issued - 

select * from abcd where id is NULL;

There are two possible access paths for the query and both are considered by the query optimizer
- 

1) Index Scan - The store is able to give away fairly accurate row estimates as 'NULL' is
a valid start as well as stop key.

2) Table scan - Here, hard wired selectivity estimates are used for row estimates as discussed
above.

> Utilise additional statistics for selectivity estimates.
> --------------------------------------------------------
>
>                 Key: DERBY-6942
>                 URL: https://issues.apache.org/jira/browse/DERBY-6942
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message