spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Lian (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-6201) INSET should coerce types
Date Mon, 09 Mar 2015 17:11:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353030#comment-14353030
] 

Cheng Lian edited comment on SPARK-6201 at 3/9/15 5:10 PM:
-----------------------------------------------------------

Played Hive type implicit conversion a bit more and found that Hive actually converts integers
to strings in your case:
{code:sql}
hive> create table t1 as select '1.00' as c1;
hive> select * from t1 where c1 in (1.0);
{code}
If {{c1}} is converted to numeric, then the {{1.00}} should appear in the result. However,
the result set is empty. For expression {{"1.00" IN (1.0)}}, a {{GenericUDFIn}} instance is
created and called with an argument list {{("1.00", 1.0}}. Then {{GenericUDFIn}} tries to
convert all arguments into a common data type from left to right. Since double is allowed
to be translated into string, {{1.0}} is converted into string {{"1.0"}}.

References:
# [Implicit type coercion support in existing database systems|http://chapeau.freevariable.com/2014/08/existing-system-coercion.html]
by William Benton
# [{{GenericUDFIn.initialize}}|https://github.com/apache/hive/blob/release-0.13.1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIn.java#L84-L100]




was (Author: lian cheng):
Played Hive type implicit conversion a bit more and found that Hive actually converts integers
to strings in your case:
{code:sql}
hive> create table t1 as select '1.00' as c1;
hive> select * from t1 where c1 in (1.0);
{code}
If {{c1}} is converted to numeric, then the {{1.00}} should appear in the result. However,
the result set is empty.

References:
# [Implicit type coercion support in existing database systems|http://chapeau.freevariable.com/2014/08/existing-system-coercion.html]
by William Benton
# [{{GenericUDFIn.initialize}}|https://github.com/apache/hive/blob/release-0.13.1/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFIn.java#L84-L100]



> INSET should coerce types
> -------------------------
>
>                 Key: SPARK-6201
>                 URL: https://issues.apache.org/jira/browse/SPARK-6201
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0, 1.3.0, 1.2.1
>            Reporter: Jianshi Huang
>
> Suppose we have the following table:
> {code}
> sqlc.jsonRDD(sc.parallelize(Seq("{\"a\": \"1\"}}", "{\"a\": \"2\"}}", "{\"a\": \"3\"}}"))).registerTempTable("d")
> {code}
> The schema is
> {noformat}
> root
>  |-- a: string (nullable = true)
> {noformat}
> Then,
> {code}
> sql("select * from d where (d.a = 1 or d.a = 2)").collect
> =>
> Array([1], [2])
> {code}
> where d.a and constants 1,2 will be casted to Double first and do the comparison as you
can find it out in the plan:
> {noformat}
> Filter ((CAST(a#155, DoubleType) = CAST(1, DoubleType)) || (CAST(a#155, DoubleType) =
CAST(2, DoubleType)))
> {noformat}
> However, if I use
> {code}
> sql("select * from d where d.a in (1,2)").collect
> {code}
> The result is empty.
> The physical plan shows it's using INSET:
> {noformat}
> == Physical Plan ==
> Filter a#155 INSET (1,2)
>  PhysicalRDD [a#155], MappedRDD[499] at map at JsonRDD.scala:47
> {noformat}
> *It seems INSET implementation in SparkSQL doesn't coerce type implicitly, where Hive
does. We should make SparkSQL conform to Hive's behavior, even though doing implicit coercion
here is very confusing for comparing String and Int.*
> Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message