spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Yuanjian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24210) incorrect handling of boolean expressions when using column in expressions in pyspark.sql.DataFrame filter function
Date Mon, 04 Jun 2018 15:34:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500383#comment-16500383
] 

Li Yuanjian commented on SPARK-24210:
-------------------------------------

I think it maybe not a bug.
#KO: returns r1 and r3ex.filter(('c1 = 1') and ('c2 = 1')).show()
This cause by python self base string __and__ implementation. After passing to df.filter,
there's only 'c2 = 1'.
#KO: returns r0 and r3ex.filter('c1 = 1 & c2 = 1').show()#KO: returns r0 and r3ex.filter('c1
== 1 & c2 == 1').show()
As you mentioned, [https://github.com/apache/spark/pull/6961] actually fix the '&' between
column, but not string expression like 'c1 = 1 & c2 = 1', here in ex.filter('c1 = 1 &
c2 = 1'), Spark parse it to valueExpression like: 'Filter (('a = (1 & 'b)) = 1), I think
this make sense here. 

> incorrect handling of boolean expressions when using column in expressions in pyspark.sql.DataFrame
filter function
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24210
>                 URL: https://issues.apache.org/jira/browse/SPARK-24210
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.2
>            Reporter: Michael H
>            Priority: Major
>
> {code:python}
> ex = spark.createDataFrame([
>     ('r0', 0, 0),
>     ('r1', 0, 1),
>     ('r2', 1, 0),
>     ('r3', 1, 1)]\
>   , "row: string, c1: int, c2: int")
> #KO: returns r1 and r3
> ex.filter(('c1 = 1') and ('c2 = 1')).show()
> #OK, raises an exception
> ex.filter(('c1 == 1') & ('c2 == 1')).show()
> #KO: returns r0 and r3
> ex.filter('c1 = 1 & c2 = 1').show()
> #KO: returns r0 and r3
> ex.filter('c1 == 1 & c2 == 1').show()
> #OK: returns r3 only
> ex.filter('c1 = 1 and c2 = 1').show()
> #OK: returns r3 only
> ex.filter('c1 == 1 and c2 == 1').show()
> {code}
> building the expressions using {code}ex.c1{code} or {code}ex['c1']{code} we don't have
this.
> Issue seems related with
> https://github.com/apache/spark/pull/6961



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message