hive-user mailing list archives

Site index · List index
Message view
Top
From Raajay <raaja...@gmail.com>
Subject CBO - get cost of the plan
Date Tue, 25 Aug 2015 00:19:04 GMT
```Hello,

I am interested to get the cost of the query plans as calculated by the
CBO. How can I get that information ? For example, consider a query with a
three way join of the following form:

Query
=====

insert overwrite table output_tab
select
a_day, a_product, b_alternate, (a_sales + b_sales + c_sales) as total_sales
from
tableA a join tableB b
on a.a_day = b.b_day and a.a_product = b.b_product
join tableC c
on b.b_day = c.c_day and b.b_alternate = c.c_alternate;

The number of rows for tableA, tableB, and tableC are of the order of
10000. I believe, that by "analyzing columns" of all the tables Hive will
have statistics regarding the number of rows, distinct values, etc.
However, when I try to print out the operator tree as determined by the
CalcitePlanner, I get the following output.

Print out of the Operator Tree
======================

HiveProject(a_day=[\$4], a_product=[\$5], b_alternate=[\$2],
total_sales=[+(+(\$6, \$3), \$9)]): rowcount =* 1.0*, cumulative cost = {4.0
rows, 0.0 cpu, 0.0 io}, id = 150
HiveJoin(condition=[AND(=(\$0, \$7), =(\$2, \$8))], joinType=[inner],
algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]): rowcount = 1.0,
cumulative cost = {4.0 rows, 0.0 cpu, 0.0 io}, id = 148
HiveJoin(condition=[AND(=(\$4, \$0), =(\$5, \$1))], joinType=[inner],
algorithm=[none], cost=[{2.0 rows, 0.0 cpu, 0.0 io}]): rowcount = 1.0,
cumulative cost = {2.0 rows, 0.0 cpu, 0.0 io}, id = 143
HiveProject(b_day=[\$0], b_product=[\$1], b_alternate=[\$2],
b_sales=[\$3]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0
io}, id = 138
HiveTableScan(table=[[default.tableb]]): rowcount = 1.0, cumulative
cost = {0}, id = 44
HiveProject(a_day=[\$0], a_product=[\$1], a_sales=[\$3]): rowcount =
1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 141
HiveTableScan(table=[[default.tablea]]): rowcount = 1.0, cumulative
cost = {0}, id = 42
HiveProject(c_day=[\$0], c_alternate=[\$2], c_sales=[\$3]): rowcount =
1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 146
HiveTableScan(table=[[default.tablec]]): rowcount = 1.0, cumulative
cost = {0}, id = 47

The number of rows as displayed here is 1.0, which is clearly not the
correct value.

- Raajay.

```
Mime
View raw message