spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-20331) Broaden support for Hive partition pruning predicate pushdown
Date Tue, 11 Jul 2017 06:51:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-20331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenchen Fan resolved SPARK-20331.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.3.0

Issue resolved by pull request 17633
[https://github.com/apache/spark/pull/17633]

> Broaden support for Hive partition pruning predicate pushdown
> -------------------------------------------------------------
>
>                 Key: SPARK-20331
>                 URL: https://issues.apache.org/jira/browse/SPARK-20331
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Michael Allman
>             Fix For: 2.3.0
>
>
> Spark 2.1 introduced scalable support for Hive tables with huge numbers of partitions.
Key to leveraging this support is the ability to prune unnecessary table partitions to answer
queries. Spark supports a subset of the class of partition pruning predicates that the Hive
metastore supports. If a user writes a query with a partition pruning predicate that is *not*
supported by Spark, Spark falls back to loading all partitions and pruning client-side. We
want to broaden Spark's current partition pruning predicate pushdown capabilities.
> One of the key missing capabilities is support for disjunctions. For example, for a table
partitioned by date, specifying with a predicate like
> {code}date = 20161011 or date = 20161014{code}
> will result in Spark fetching all partitions. For a table partitioned by date and hour,
querying a range of hours across dates can be quite difficult to accomplish without fetching
all partition metadata.
> The current partition pruning support supports only comparisons against literals. We
can expand that to foldable expressions by evaluating them at planning time.
> We can also implement support for the "IN" comparison by expanding it to a sequence of
"OR"s.
> This ticket covers those enhancements.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message