spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-5614) Predicate pushdown through Generate
Date Thu, 12 Feb 2015 04:39:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Patrick Wendell updated SPARK-5614:
-----------------------------------
    Assignee: Lu Yan

> Predicate pushdown through Generate
> -----------------------------------
>
>                 Key: SPARK-5614
>                 URL: https://issues.apache.org/jira/browse/SPARK-5614
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Lu Yan
>            Assignee: Lu Yan
>             Fix For: 1.3.0
>
>
> Now in Catalyst's rules, predicates can not be pushed through "Generate" nodes. Further
more, partition pruning in HiveTableScan can not be applied on those queries involves "Generate".
This makes such queries very inefficient.
> For example, physical plan for query
> {quote}
> select len, bk
> from s_server lateral view explode(len_arr) len_table as len 
> where len > 5 and day = '20150102';
> {quote}
> where 'day' is a partition column in metastore is like this in current version of Spark
SQL:
> {quote}
> Project [len, bk]
> Filter ((len > "5") && "(day = "20150102")")
> Generate explode(len_arr), true, false
> HiveTableScan [bk, len_arr, day], (MetastoreRelation default, s_server, None), None
> {quote}
> But theoretically the plan should be like this
> {quote}
> Project [len, bk]
> Filter (len > "5")
> Generate explode(len_arr), true, false
> HiveTableScan [bk, len_arr, day], (MetastoreRelation default, s_server, None), Some(day
= "20150102")
> {quote} 
> Where partition pruning predicates can be pushed to HiveTableScan nodes.
> I've developed a solution on this issue. If you guys do not have a plan for this already,
I could merge the solution back to master.
> And there is also a problem on column pruning for "Generate", I would file another issue
about that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message