hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vineet Garg (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats
Date Thu, 31 Aug 2017 22:00:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vineet Garg updated HIVE-16811:
-------------------------------
    Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---------------------------------------
>
>                 Key: HIVE-16811
>                 URL: https://issues.apache.org/jira/browse/HIVE-16811
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>         Attachments: HIVE-16811.10.patch, HIVE-16811.11.patch, HIVE-16811.1.patch, HIVE-16811.2.patch,
HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, HIVE-16811.6.patch, HIVE-16811.7.patch,
HIVE-16811.8.patch, HIVE-16811.9.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and this could
lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, S_NATIONKEY INT,

> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY      INT,
>                                 L_PARTKEY       INT,
>                                 L_SUPPKEY       INT,
>                                 L_LINENUMBER    INT,
>                                 L_QUANTITY      DOUBLE,
>                                 L_EXTENDEDPRICE DOUBLE,
>                                 L_DISCOUNT      DOUBLE,
>                                 L_TAX           DOUBLE,
>                                 L_RETURNFLAG    STRING,
>                                 L_LINESTATUS    STRING,
>                                 l_shipdate      STRING,
>                                 L_COMMITDATE    STRING,
>                                 L_RECEIPTDATE   STRING,
>                                 L_SHIPINSTRUCT  STRING,
>                                 L_SHIPMODE      STRING,
>                                 L_COMMENT       STRING) partitioned by (dl int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
>     p_partkey INT,
>     p_name STRING,
>     p_mfgr STRING,
>     p_brand STRING,
>     p_type STRING,
>     p_size INT,
>     p_container STRING,
>     p_retailprice DOUBLE,
>     p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = l_partkey and s_suppkey
= l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up with join
at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message