hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-8316) CBO : cardinality estimation for filters is much lower than actual row count
Date Tue, 30 Sep 2014 22:10:33 GMT

     [ https://issues.apache.org/jira/browse/HIVE-8316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mostafa Mokhtar updated HIVE-8316:
----------------------------------
    Description: 
CBO underestimates selectivity from filter which consequently results in under estimation
throughout the plan.


{code}
8 rows, 0.0 cpu, 0.0 io}, id = 7808
                                            HiveJoinRel(condition=[=($0, $12)], joinType=[inner]):
rowcount = 11459.928208333333, cumulative cost = {5.50076555E8 rows, 0.0 cpu, 0.0 io}, id
= 7426
                                              HiveProjectRel(ss_item_sk=[$1], ss_customer_sk=[$2],
ss_cdemo_sk=[$3], ss_hdemo_sk=[$4], ss_addr_sk=[$5], ss_store_sk=[$6], ss_promo_sk=[$7], ss_ticket_number=[$8],
ss_wholesale_cost=[$10], ss_list_price=[$11], ss_coupon_amt=[$18], ss_sold_date_sk=[$22]):
rowcount = 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 893
                                                HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.store_sales]]):
rowcount = 5.50076554E8, cumulative cost = {0}, id = 55
                                              HiveProjectRel(i_item_sk=[$0], i_current_price=[$5],
i_color=[$17], i_product_name=[$21]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu,
0.0 io}, id = 1163
                                                HiveFilterRel(condition=[AND(in($17, 'maroon',
'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $5, 35, +(35, 10)), between(false,
$5, +(35, 1), +(35, 15)))]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io},
id = 1161
                                                  HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.item]]):
rowcount = 48000.0, cumulative cost = {0}, id = 68
{code}

{code}
select count(*) from item where  i_color in ('maroon','burnished','dim','steel','navajo','chocolate')
and
         i_current_price between 35 and 35 + 10 and
         i_current_price between 35 + 1 and 35 + 15;
{code}

  was:
CBO underestimates selectivity from filter which consequently results in under estimation
throughout the plan.


{code}

{code}


> CBO : cardinality estimation for filters is much lower than actual row count
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-8316
>                 URL: https://issues.apache.org/jira/browse/HIVE-8316
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 0.14.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Harish Butani
>             Fix For: 0.14.0
>
>
> CBO underestimates selectivity from filter which consequently results in under estimation
throughout the plan.
> {code}
> 8 rows, 0.0 cpu, 0.0 io}, id = 7808
>                                             HiveJoinRel(condition=[=($0, $12)], joinType=[inner]):
rowcount = 11459.928208333333, cumulative cost = {5.50076555E8 rows, 0.0 cpu, 0.0 io}, id
= 7426
>                                               HiveProjectRel(ss_item_sk=[$1], ss_customer_sk=[$2],
ss_cdemo_sk=[$3], ss_hdemo_sk=[$4], ss_addr_sk=[$5], ss_store_sk=[$6], ss_promo_sk=[$7], ss_ticket_number=[$8],
ss_wholesale_cost=[$10], ss_list_price=[$11], ss_coupon_amt=[$18], ss_sold_date_sk=[$22]):
rowcount = 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 893
>                                                 HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.store_sales]]):
rowcount = 5.50076554E8, cumulative cost = {0}, id = 55
>                                               HiveProjectRel(i_item_sk=[$0], i_current_price=[$5],
i_color=[$17], i_product_name=[$21]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu,
0.0 io}, id = 1163
>                                                 HiveFilterRel(condition=[AND(in($17,
'maroon', 'burnished', 'dim', 'steel', 'navajo', 'chocolate'), between(false, $5, 35, +(35,
10)), between(false, $5, +(35, 1), +(35, 15)))]): rowcount = 1.0, cumulative cost = {0.0 rows,
0.0 cpu, 0.0 io}, id = 1161
>                                                   HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_200_orig.item]]):
rowcount = 48000.0, cumulative cost = {0}, id = 68
> {code}
> {code}
> select count(*) from item where  i_color in ('maroon','burnished','dim','steel','navajo','chocolate')
and
>          i_current_price between 35 and 35 + 10 and
>          i_current_price between 35 + 1 and 35 + 15;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message