spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Alexander Spitzer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-12639) Improve Explain for DataSources with Handled Predicate Pushdowns
Date Fri, 08 Jan 2016 01:41:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088558#comment-15088558
] 

Russell Alexander Spitzer commented on SPARK-12639:
---------------------------------------------------

https://github.com/apache/spark/pull/10655 [~yhuai]

> Improve Explain for DataSources with Handled Predicate Pushdowns
> ----------------------------------------------------------------
>
>                 Key: SPARK-12639
>                 URL: https://issues.apache.org/jira/browse/SPARK-12639
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.0
>            Reporter: Russell Alexander Spitzer
>            Priority: Minor
>
> SPARK-11661 improves handling of predicate pushdowns but has an unintended consequence
of making the explain string more confusing.
> It basically makes it seem as if a source is always pushing down all of the filters (even
those it cannot handle)
> This can have a confusing effect (I kept checking my code to see where I had broken something
 )
> {code: title= "Query plan for source where nothing is handled by C* Source"}
> Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 =
1))
> +- Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
> {code}
> Although the tell tale "Filter" step is present my first instinct would tell me that
the underlying source relation is using all of those filters.
> {code: title = "Query plan for source where everything is handled by C* Source"}
> Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@55d4456c[a#79,b#80,c#81,d#82,e#83,f#84,g#85,h#86]
PushedFilters: [EqualTo(a,1), EqualTo(b,2), EqualTo(c,1), EqualTo(e,1)]
> {code}
> I think this would be much clearer if we changed the metadata key to "HandledFilters"
and only listed those handled fully by the underlying source.
> Something like
> {code: title="Proposed Explain for Pushdown were none of the predicates are handled by
the underlying source"}
> Filter ((((a#71 = 1) && (b#72 = 2)) && (c#73 = 1)) && (e#75 =
1))
> +- Scan org.apache.spark.sql.cassandra.CassandraSourceRelation@4b9cf75c[a#71,b#72,c#73,d#74,e#75,f#76,g#77,h#78]
HandledFilters: []
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message