impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bharath v (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-5615) Compute Incremental stats is broken for general partition expressions
Date Tue, 25 Jul 2017 05:08:02 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

bharath v resolved IMPALA-5615.
-------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

https://gerrit.cloudera.org/#/c/7379/

> Compute Incremental stats is broken for general partition expressions
> ---------------------------------------------------------------------
>
>                 Key: IMPALA-5615
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5615
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>            Reporter: bharath v
>            Assignee: bharath v
>            Priority: Blocker
>              Labels: correctness
>             Fix For: Impala 2.10.0
>
>
> It turns out that the logic is ComputeStatsStmt#analyze() doesn't work well with general
partition expressions.  A simple repro for it is as follows,
> {noformat}
> 1) Prepare test data:
> create table pp(c int) partitioned by (p1 int, p2 int);
> insert into pp partition (p1=10, p2) select 1, 1;
> insert into pp partition (p1=10, p2) select 2,2;
> 2) Generate correct stats:
> compute stats pp;
> show table stats pp;
> Query: show table stats pp
> +-------+----+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------+
> | p1    | p2 | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental
stats | Location                                            |
> +-------+----+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------+
> | 10    | 1  | 1     | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | true
             | hdfs://localhost:20500/test-warehouse/pp/p1=10/p2=1 |
> | 10    | 2  | 1     | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | true
             | hdfs://localhost:20500/test-warehouse/pp/p1=10/p2=2 |
> | Total |    | 0     | 2      | 4B   | 0B           |                   |        |  
                |                                                     |
> +-------+----+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------+
> Fetched 3 row(s) in 0.02s
> 3) Reproduce the issue:
> compute incremental stats pp partition (p1=10);
> show table stats pp;
> Query: show table stats pp
> +-------+----+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------+
> | p1    | p2 | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental
stats | Location                                            |
> +-------+----+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------+
> | 10    | 1  | 0     | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | true
             | hdfs://localhost:20500/test-warehouse/pp/p1=10/p2=1 |
> | 10    | 2  | 0     | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | true
             | hdfs://localhost:20500/test-warehouse/pp/p1=10/p2=2 |
> | Total |    | 0     | 2      | 4B   | 0B           |                   |        |  
                |                                                     |
> +-------+----+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------+
> Fetched 3 row(s) in 0.01s
> {noformat}
> The bug is in the child queries generated by the incremental stats query.
> {noformat}
> SELECT NDV_NO_FINALIZE(c) AS c, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE), COUNT(c), p1,
p2 FROM pp WHERE ((p1=10 AND p2=1) AND (p1=10 AND p2=2)) GROUP BY p1, p2	
> SELECT COUNT(*), p1, p2 FROM pp WHERE ((p1=10 AND p2=1) AND (p1=10 AND p2=2)) GROUP BY
p1, p2
> {noformat}
> Specifically, the problem is in the filter predicate generated. {{((p1=10 AND p2=1) AND
(p1=10 AND p2=2))}}. It turns out that the ComputeStats#analyze() is broken due to IMPALA-1654
and we need to rewrite the logic to support general partition expressions based on {{PartitionSet}}.
> Workaround: Don't use general partition expressions and instead use a full partition
spec, i.e., run the compute incremental stats for one partition at a time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message