spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (SPARK-10978) Allow PrunedFilterScan to eliminate predicates from further evaluation
Date Wed, 11 Nov 2015 18:13:11 GMT

     [ https://issues.apache.org/jira/browse/SPARK-10978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yin Huai updated SPARK-10978:
-----------------------------
    Comment: was deleted

(was: Thanks the for the test! I think there is a bug.)

> Allow PrunedFilterScan to eliminate predicates from further evaluation
> ----------------------------------------------------------------------
>
>                 Key: SPARK-10978
>                 URL: https://issues.apache.org/jira/browse/SPARK-10978
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 1.3.0, 1.4.0, 1.5.0
>            Reporter: Russell Alexander Spitzer
>            Assignee: Cheng Lian
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> Currently PrunedFilterScan allows implementors to push down predicates to an underlying
datasource. This is done solely as an optimization as the predicate will be reapplied on the
Spark side as well. This allows for bloom-filter like operations but ends up doing a redundant
scan for those sources which can do accurate pushdowns.
> In addition it makes it difficult for underlying sources to accept queries which reference
non-existent to provide ancillary function. In our case we allow a solr query to be passed
in via a non-existent solr_query column. Since this column is not returned when Spark does
a filter on "solr_query" nothing passes. 
> Suggestion on the ML from [~marmbrus] 
> {quote}
> We have to try and maintain binary compatibility here, so probably the easiest thing
to do here would be to add a method to the class.  Perhaps something like:
> def unhandledFilters(filters: Array[Filter]): Array[Filter] = filters
> By default, this could return all filters so behavior would remain the same, but specific
implementations could override it.  There is still a chance that this would conflict with
existing methods, but hopefully that would not be a problem in practice.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message