impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeszy <jes...@gmail.com>
Subject Re: Enable Impala-kudu table, column stats.
Date Tue, 25 Apr 2017 11:41:33 GMT
Hey,

The difference in the distinct values is expected, the estimation that
the NDV function
(https://www.cloudera.com/documentation/enterprise/latest/topics/impala_ndv.html)
gives is good enough, and the execution is much faster.  You can set
both the table and the column stats manually as described here:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_perf_stats.html#perf_table_stats_manual.

>From your question it seems you expected Impala to return the column
statistics-stored value for the distinct count. This is not possible
currently, the intent with these statistics is to help planning by
allowing Impala to come up with a more educated guess on join ordering
and cardinality - if you look at a summary like the one you pasted,
the estimated values (both #rows and peak mem) is influenced by stats.
There is no way to tell whether the stats are stale (only whether they
are computed or not).

Does this answer your questions?

2017-04-25 13:26 GMT+02:00 기준 <0ctopus13prime@gmail.com>:
> Hi!
>
> I'm using impala-kudu currently.
>
> impala's version is v2.7.0
> kudu's version is 1.3
>
> I found out table statistics hint few days ago.
>
> So i tried compute statistics using command `compute stats`.
>
> After short time no errors shown my screen, but all the rows was -1.
>
> So i searched about this, then i could find this one.
> https://issues.apache.org/jira/browse/IMPALA-2830
>
> Question 1. Can i manually set rows?
>
> And i found column statistics computed with wrong value.
>
> For example, some column's actual distinct value was 5092153,
> but command `show column stats ${table}` shows 5405440.
> (Similar other columns too)
>
> Question 2. Why this difference happens? Also, can i set value manually?
>
> And after all i'm not clear impala use this information during query
> processing.
>
> For example,
> Issued `SELECT COUNT(DISTINCT ${column}) FROM ${table}`,
> and i found impala scan from kudu using `summary` command.
>
> +--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+
> | Operator     | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak
> Mem  | Est. Peak Mem | Detail        |
> +--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+
> | 06:AGGREGATE | 1      | 121.09us | 121.09us | 1       | 1          | 64.00
> KB  | -1 B          | FINALIZE      |
> | 05:EXCHANGE  | 1      | 61.82us  | 61.82us  | 12      | 1          | 0 B
> | -1 B          | UNPARTITIONED |
> | 02:AGGREGATE | 12     | 3.71ms   | 5.53ms   | 12      | 1          | 16.00
> KB  | 10.00 MB      |               |
> | 04:AGGREGATE | 12     | 171.00ms | 181.15ms | 5.09M   | 5.41M      |
> 154.58 MB | 11.57 MB      |               |
> | 03:EXCHANGE  | 12     | 12.85ms  | 14.27ms  | 7.93M   | 5.41M      | 0 B
> | 0 B           | HASH(c)       |
> | 01:AGGREGATE | 12     | 2.72s    | 4.80s    | 7.93M   | 5.41M      |
> 170.08 MB | 138.88 MB     | STREAMING     |
> | 00:SCAN KUDU | 12     | 991.95ms | 5.38s    | 963.00M | 963.00M    | 2.30
> MB   | 0 B           |               |
> +--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+
>
> Why impala did not used column statistics information?
>
> Question 3. If i can set statistics value manually, can impala understands
> that?
> it seems impala do not use computed statistics information.
>
> I working on this, but it's hard to know more.
>
> Thanks! Have a nice day.
>
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message