spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bogdan Raducanu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-20744) Predicates with multiple columns do not work
Date Thu, 01 Jun 2017 09:20:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032679#comment-16032679
] 

Bogdan Raducanu commented on SPARK-20744:
-----------------------------------------

Array generally needs all components to be same type. Casts are added automatically but it's
not always possible:

```sql("select array(now(), 1)").show```

```org.apache.spark.sql.AnalysisException: cannot resolve 'array(current_timestamp(), 1)'
due to data type mismatch: input to function array should all be the same type, but it's [timestamp,
int]; line 1 pos 7;```

> Predicates with multiple columns do not work
> --------------------------------------------
>
>                 Key: SPARK-20744
>                 URL: https://issues.apache.org/jira/browse/SPARK-20744
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Bogdan Raducanu
>
> The following code reproduces the problem:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in ((1,1))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', `a`, 'b',
`b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type mismatch: Arguments must be
same type; line 1 pos 6;
> 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1))
> +- Project [id#39L AS a#42L, id#39L AS b#43L]
>    +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Similarly it won't work from SQL either, which is something that other SQL DB support:
> {code}
> scala> spark.range(10).selectExpr("id as a", "id as b").createOrReplaceTempView("tab1")
> scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', tab1.`a`,
'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), named_struct('col1', 2, 'col2', 2)))'
due to data type mismatch: Arguments must be same type; line 1 pos 31;
> 'Project [*]
> +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, 1),named_struct(col1,
2, col2, 2))
>    +- SubqueryAlias tab1
>       +- Project [id#47L AS a#50L, id#47L AS b#51L]
>          +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Other examples:
> {code}
> scala> sql("select * from tab1 where (a,b) =(1,1)").show
> org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', tab1.`a`,
'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data type mismatch: differing
types in '(named_struct('a', tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))'
(struct<a:bigint,b:bigint> and struct<col1:int,col2:int>).; line 1 pos 25;
> 'Project [*]
> +- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1))
>    +- SubqueryAlias tab1
>       +- Project [id#47L AS a#50L, id#47L AS b#51L]
>          +- Range (0, 10, step=1, splits=Some(1))
> {code}
> Expressions such as (1,1) are apparently read as structs and then the types do not match.
Perhaps they should be arrays.
> The following code works:
> {code}
> sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show
> {code}
> This also works, but requires the cast:
> {code}
> sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint), 'b', cast(1
as bigint)))").show
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message