hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mert Hocanin (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-21419) Partition Pruning not happening when using Apache Ranger masking
Date Wed, 01 May 2019 14:35:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mert Hocanin resolved HIVE-21419.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

> Partition Pruning not happening when using Apache Ranger masking
> ----------------------------------------------------------------
>
>                 Key: HIVE-21419
>                 URL: https://issues.apache.org/jira/browse/HIVE-21419
>             Project: Hive
>          Issue Type: Bug
>          Components: Physical Optimizer, Query Planning
>    Affects Versions: 2.3.2
>         Environment: I used an AWS Cloudformation script from AWS's big data blog[1].
The EMR AMI uses Hive 2.3.3 and Apache Ranger 1.0.0. 
> Source Table:
> CREATE EXTERNAL TABLE analyst1.lineitem_partitioned (
>     `l_orderkey` int, 
>     `l_partkey` int, 
>     `l_suppkey` int, 
>     `l_linenumber` int, 
>     `l_quantity` double, 
>     `l_extendedprice` double, 
>     `l_discount` double, 
>     `l_tax` double, 
>     `l_returnflag` string, 
>     `l_linestatus` string, 
>     `l_commitdate` string, 
>     `l_receiptdate` string, 
>     `l_shipinstruct` string, 
>     `l_shipmode` string, 
>    `l_comment` string
> ) PARTITIONED BY (`l_shipdate` string)
> STORED AS PARQUET
> LOCATION '/user/analyst1/tpch/sf100/lineitem';
> Destination Table:
> CREATE EXTERNAL TABLE analyst1.test1(
>    l_commitdate string,
>    l_receiptdate string
> ) PARTITIONED BY (`l_shipdate` string)
> STORED AS PARQUET
> LOCATION '/user/analyst1/tpch/sf100/lineitem_parq_partitioned';
> Query:
> insert overwrite table analyst1.test1 PARTITION (l_shipdate)
> select l_commitdate, l_receiptdate, l_shipdate
> from default.lineitem_parq_partitioned 
> where l_shipdate = '1992-01-02';
> Ranger Masking Rule:
> Hive Database: analyst1
> Hive Table: lineitem_partitioned
> Mask Condition Option: Custom: "XXXXXX" (replace the column with a static string for
simplicity, but our use case uses a complex UDF).
> [1] https://aws.amazon.com/blogs/big-data/implementing-authorization-and-auditing-using-apache-ranger-on-amazon-emr/ 
>            Reporter: Mert Hocanin
>            Priority: Major
>             Fix For: 3.0.0
>
>         Attachments: Operators-in-debugger-with-masking.png, Operators-in-debugger-without-masking.png,
hive-jira-schema-explain-plan.txt
>
>
> I have a partitioned table, which I have a Ranger masking policy on a non-partition column.
When I am attempting to query the table that includes the column that has masking enabled,
then partition pruning no longer occurs. 
> To reproduce:
> Create two partitioned tables. I used TPC-H tables as they are publicly available and
will provide the schemas and queries I used. Insert into the second table from the first table.
For example:
> insert overwrite table analyst1.test1 PARTITION (l_shipdate)
> select l_commitdate, l_receiptdate, l_shipdate
> from analyst1.lineitem_partitioned 
> where l_shipdate = '1992-01-02';
> I have attached the explain plan when a masking rule on l_commitdate is enabled and when
not enabled.
> I have done a bit of deep dive and see that the pruning expression is not being set when
the masking rule is enabled. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message