impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zach Amsden (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-5003) Add 'constant propagation' for Views with a partition filter
Date Fri, 19 May 2017 19:10:04 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zach Amsden resolved IMPALA-5003.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.9.0

Also want follow-up change:

https://gerrit.cloudera.org/#/c/6575/

> Add 'constant propagation' for Views with a partition filter
> ------------------------------------------------------------
>
>                 Key: IMPALA-5003
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5003
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.9.0
>            Reporter: Peter Ebert
>            Assignee: Zach Amsden
>              Labels: ramp-up
>             Fix For: Impala 2.9.0
>
>
> There are scenarios when you (1) want to reduce the number of partitions by hash&mod
or part of a date, etc and (2) do not want to expose this complexity to end users who do not
know how the table might be partitioned.
> As an example: Say I have a column that has 1000 values and I want to partition on customer_id,
but the data is too small for 1000 partitions so I want to mod the customer_id to put it into
100 'buckets', to give me a 100x faster scan when selecting a single customer_id.
> When using 3rd party tools or exposing this table to end users who may not be educated
on the partitioning scheme, those tools will not properly filter based on simply selecting
a specific customer_id.
> If this is my partitioned table:
> CREATE TABLE default.lesspartitions (   customer_id INT,    customer_name STRING,   
some_data STRING ) PARTITIONED BY (   partition_id INT ) 
> ...
> select customer_id, customer_name, some_data, customer_id % 100 as partition_id from
source
> It would be nice to create a view where the partition is hidden from users and filtered
on automatically:
> CREATE VIEW default.hiddenpartitions AS SELECT customer_id, customer_name, some_data
FROM default.lesspartitions WHERE partition_id = customer_id % 100
> So then a user/3rd party tool can write the following:
> select * from hiddenpartitions where customer_id = 4
> This jira would adding the 'constant propagation' of the filter into the view so that
only 1 partition is read.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message