spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-25816) Functions does not resolve Columns correctly
Date Sun, 28 Oct 2018 17:09:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun updated SPARK-25816:
----------------------------------
    Affects Version/s: 2.4.0

> Functions does not resolve Columns correctly
> --------------------------------------------
>
>                 Key: SPARK-25816
>                 URL: https://issues.apache.org/jira/browse/SPARK-25816
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>            Reporter: Brian Zhang
>            Priority: Critical
>         Attachments: final_allDatatypes_Spark.avro, source.snappy.parquet
>
>
> When there is a duplicate column name in the current Dataframe and orginal Dataframe
where current df is selected from, Spark in 2.3.0 and 2.3.1 does not resolve the column correctly
when using it in the expression, hence causing casting issue. The same code is working in
Spark 2.2.1
> Please see below code to reproduce the issue
> import org.apache.spark._
> import org.apache.spark.rdd._
> import org.apache.spark.storage.StorageLevel._
> import org.apache.spark.sql._
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.catalyst.expressions._
> import org.apache.spark.sql.Column
> val v0 = spark.read.parquet("/data/home/bzinfa/bz/source.snappy.parquet")
> val v00 = v0.toDF(v0.schema.fields.indices.view.map("" + _):_*)
> val v5 = v00.select($"13".as("0"),$"14".as("1"),$"15".as("2"))
> val v5_2 = $"2"
> v5.where(lit(500).<(v5_2(new Column(new MapKeys(v5_2.expr))(lit(0)))))
> //v00's 3rdcolumn is binary and 16th is map<string, double>
> Error:
> org.apache.spark.sql.AnalysisException: cannot resolve 'map_keys(`2`)' due to data type
mismatch: argument 1 requires map type, however, '`2`' is of binary type.;
>  
>  'Project [0#1591, 1#1592, 2#1593] +- 'Filter (500 < {color:#FF0000}2#1593{color}[map_keys({color:#FF0000}2#1561{color})[0]])
+- Project [13#1572 AS 0#1591, 14#1573 AS 1#1592, 15#1574 AS 2#1593, 2#1561] +- Project [c_bytes#1527
AS 0#1559, c_union#1528 AS 1#1560, c_fixed#1529 AS 2#1561, c_boolean#1530 AS 3#1562, c_float#1531
AS 4#1563, c_double#1532 AS 5#1564, c_int#1533 AS 6#1565, c_long#1534L AS 7#1566L, c_string#1535
AS 8#1567, c_decimal_18_2#1536 AS 9#1568, c_decimal_28_2#1537 AS 10#1569, c_decimal_38_2#1538
AS 11#1570, c_date#1539 AS 12#1571, simple_struct#1540 AS 13#1572, simple_array#1541 AS 14#1573,
simple_map#1542 AS 15#1574] +- Relation[c_bytes#1527,c_union#1528,c_fixed#1529,c_boolean#1530,c_float#1531,c_double#1532,c_int#1533,c_long#1534L,c_string#1535,c_decimal_18_2#1536,c_decimal_28_2#1537,c_decimal_38_2#1538,c_date#1539,simple_struct#1540,simple_array#1541,simple_map#1542]
parquet



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message