impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 기준 <0ctopus13pr...@gmail.com>
Subject Enable Impala-kudu table, column stats.
Date Tue, 25 Apr 2017 11:26:59 GMT
Hi!

I'm using impala-kudu currently.

impala's version is v2.7.0
kudu's version is 1.3

I found out table statistics hint few days ago.

So i tried compute statistics using command `compute stats`.

After short time no errors shown my screen, but all the rows was -1.

So i searched about this, then i could find this one.
https://issues.apache.org/jira/browse/IMPALA-2830

Question 1. Can i manually set rows?

And i found column statistics computed with wrong value.

For example, some column's actual distinct value was 5092153,
but command `show column stats ${table}` shows 5405440.
(Similar other columns too)

Question 2. Why this difference happens? Also, can i set value manually?

And after all i'm not clear impala use this information during query
processing.

For example,
Issued `SELECT COUNT(DISTINCT ${column}) FROM ${table}`,
and i found impala scan from kudu using `summary` command.

+--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+
| Operator     | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak
Mem  | Est. Peak Mem | Detail        |
+--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+
| 06:AGGREGATE | 1      | 121.09us | 121.09us | 1       | 1          |
64.00 KB  | -1 B          | FINALIZE      |
| 05:EXCHANGE  | 1      | 61.82us  | 61.82us  | 12      | 1          | 0 B
      | -1 B          | UNPARTITIONED |
| 02:AGGREGATE | 12     | 3.71ms   | 5.53ms   | 12      | 1          |
16.00 KB  | 10.00 MB      |               |
| 04:AGGREGATE | 12     | 171.00ms | 181.15ms | 5.09M   | 5.41M      |
154.58 MB | 11.57 MB      |               |
| 03:EXCHANGE  | 12     | 12.85ms  | 14.27ms  | 7.93M   | 5.41M      | 0 B
      | 0 B           | HASH(c)       |
| 01:AGGREGATE | 12     | 2.72s    | 4.80s    | 7.93M   | 5.41M      |
170.08 MB | 138.88 MB     | STREAMING     |
| 00:SCAN KUDU | 12     | 991.95ms | 5.38s    | 963.00M | 963.00M    | 2.30
MB   | 0 B           |               |
+--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+

Why impala did not used column statistics information?

Question 3. If i can set statistics value manually, can impala understands
that?
it seems impala do not use computed statistics information.

I working on this, but it's hard to know more.

Thanks! Have a nice day.

Mime
View raw message