hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pengcheng Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14053) Hive should report that primary keys can't be null.
Date Tue, 20 Dec 2016 01:07:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15762827#comment-15762827
] 

Pengcheng Xiong commented on HIVE-14053:
----------------------------------------

[~ashutoshc], could u take another look? Thanks.

> Hive should report that primary keys can't be null.
> ---------------------------------------------------
>
>                 Key: HIVE-14053
>                 URL: https://issues.apache.org/jira/browse/HIVE-14053
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Carter Shanklin
>            Assignee: Pengcheng Xiong
>            Priority: Minor
>         Attachments: HIVE-14053.01.patch, HIVE-14053.02.patch
>
>
> HIVE-13076 introduces "rely novalidate" primary and foreign keys to Hive. With the right
driver in place, tools like Tableau can do join elimination and queries can run much faster.
> Some gaps remain, currently getAttributes() in HiveDatabaseMetaData doesn't work quite
right for keys. In particular, primary keys by definition are not null and the metadata should
reflect this for improved join elimination.
> In this example that uses the TPC-H schema and its constraints, we sum l_extendedprice
and group by l_shipmode. This query should not use more than just the lineitem table.
> With all the constraints in place, Tableau generates this query:
> {code}
> SELECT `lineitem`.`l_shipmode` AS `l_shipmode`,
>   SUM(`lineitem`.`l_extendedprice`) AS `sum_l_extendedprice_ok`
> FROM `tpch_bin_flat_orc_2`.`lineitem` `lineitem`
>   JOIN `tpch_bin_flat_orc_2`.`orders` `orders` ON (`lineitem`.`l_orderkey` = `orders`.`o_orderkey`)
>   JOIN `tpch_bin_flat_orc_2`.`customer` `customer` ON (`orders`.`o_custkey` = `customer`.`c_custkey`)
>   JOIN `tpch_bin_flat_orc_2`.`nation` `nation` ON (`customer`.`c_nationkey` = `nation`.`n_nationkey`)
> WHERE ((((NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL)))
AND ((NOT (`lineitem`.`l_partkey` IS NULL)) AND (NOT (`lineitem`.`l_suppkey` IS NULL)))) AND
(NOT (`nation`.`n_regionkey` IS NULL)))
> {code}
> Since these are the primary keys the denormalization and the where condition is unnecessary
and this sort of query can be a lot faster by just accessing the lineitem table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message