impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincent Tran (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-5494) NOT IN predicate shares the same selectivity as IN predicate
Date Sat, 17 Jun 2017 00:58:00 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vincent Tran resolved IMPALA-5494.
----------------------------------
    Resolution: Fixed

Author:     Vincent Tran <vttran@cloudera.com>
AuthorDate: 2017-06-13 03:07:04 -0400
Commit:     Impala Public Jenkins <impala-public-jenkins@gerrit.cloudera.org>
CommitDate: 2017-06-16 22:18:07 +0000

IMPALA-5494: Fixes the selectivity of NOT IN predicates

This change modifies the logic of NOT IN predicate so that
the planner can calculate the correct node cardinality. Prior
to this change, both IN and NOT IN predicates shared the
same selectivity, which resulted in the same cardinality
during planning.

The selectivity is calculated by the following heuristic:

selectivity = 1 - (num of predicate children /
                num of distinct values)

Change-Id: I69e6217257b5618cb63e13b32ba3347fa0483b63
Reviewed-on: http://gerrit.cloudera.org:8080/7168
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Impala Public Jenkins

> NOT IN predicate shares the same selectivity as IN predicate
> ------------------------------------------------------------
>
>                 Key: IMPALA-5494
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5494
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: Vincent Tran
>            Assignee: Vincent Tran
>            Priority: Minor
>
> It appears that the cardinality of a NOT IN predicate is the same as one for an IN predicate.
This appears illogical. 
> :
> Logical approaches:
> When isNotIn(), the logic in InPredicate should either set the selectivity to:
> 1 - (numChild/NumDistinctValues)
> *OR*
> default selectivity (i.e. 0.1) as in the equivalent case of multiple inequality predicates
(seen below)
> {noformat}
> [kiwi-3:21000] > set explain_level=3;
> EXPLAIN_LEVEL set to 3
> [kiwi-3:21000] > explain select * from customers where id in (1,2);
> Query: explain select * from customers where id in (1,2)
> +-------------------------------------------------------+
> | Explain String                                        |
> +-------------------------------------------------------+
> | Per-Host Resource Reservation: Memory=0B              |
> | Per-Host Resource Estimates: Memory=16.00MB           |
> |                                                       |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
> |   PLAN-ROOT SINK                                      |
> |   |  mem-estimate=0B mem-reservation=0B               |
> |   |                                                   |
> |   00:SCAN HDFS [default.customers]                    |
> |      partitions=1/1 files=1 size=15.44KB              |
> |      predicates: id IN (1, 2)                         |
> |      table stats: 53 rows total                       |
> |      column stats: all                                |
> |      parquet statistics predicates: id IN (1, 2)      |
> |      parquet dictionary predicates: id IN (1, 2)      |
> |      mem-estimate=16.00MB mem-reservation=0B          |
> |      tuple-ids=0 row-size=35B cardinality=2           |
> +-------------------------------------------------------+
> Fetched 16 row(s) in 0.02s
> [kiwi-3:21000] > explain select * from customers where id not in (1,2);
> Query: explain select * from customers where id not in (1,2)
> +-------------------------------------------------------+
> | Explain String                                        |
> +-------------------------------------------------------+
> | Per-Host Resource Reservation: Memory=0B              |
> | Per-Host Resource Estimates: Memory=16.00MB           |
> |                                                       |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
> |   PLAN-ROOT SINK                                      |
> |   |  mem-estimate=0B mem-reservation=0B               |
> |   |                                                   |
> |   00:SCAN HDFS [default.customers]                    |
> |      partitions=1/1 files=1 size=15.44KB              |
> |      predicates: id NOT IN (1, 2)                     |
> |      table stats: 53 rows total                       |
> |      column stats: all                                |
> |      parquet dictionary predicates: id NOT IN (1, 2)  |
> |      mem-estimate=16.00MB mem-reservation=0B          |
> |      tuple-ids=0 row-size=35B cardinality=2           |
> +-------------------------------------------------------+
> Fetched 15 row(s) in 0.01s
> [kiwi-3:21000] > explain select * from customers where id !=1 and id !=2;
> Query: explain select * from customers where id !=1 and id !=2
> +-------------------------------------------------------+
> | Explain String                                        |
> +-------------------------------------------------------+
> | Per-Host Resource Reservation: Memory=0B              |
> | Per-Host Resource Estimates: Memory=16.00MB           |
> |                                                       |
> | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
> |   PLAN-ROOT SINK                                      |
> |   |  mem-estimate=0B mem-reservation=0B               |
> |   |                                                   |
> |   00:SCAN HDFS [default.customers]                    |
> |      partitions=1/1 files=1 size=15.44KB              |
> |      predicates: id != 1, id != 2                     |
> |      table stats: 53 rows total                       |
> |      column stats: all                                |
> |      parquet dictionary predicates: id != 1, id != 2  |
> |      mem-estimate=16.00MB mem-reservation=0B          |
> |      tuple-ids=0 row-size=35B cardinality=5           |
> +-------------------------------------------------------+
> Fetched 15 row(s) in 0.02s
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message