spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18593) JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL
Date Sun, 27 Nov 2016 11:32:59 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699536#comment-15699536
] 

Apache Spark commented on SPARK-18593:
--------------------------------------

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/16021

> JDBCRDD returns incorrect results for filters on CHAR of PostgreSQL
> -------------------------------------------------------------------
>
>                 Key: SPARK-18593
>                 URL: https://issues.apache.org/jira/browse/SPARK-18593
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 1.6.3
>            Reporter: Durga Prasad Gunturu
>            Priority: Minor
>              Labels: correctness
>
> In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on
CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded
string` for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false.
Spark 2.0.0 removes the post filter because it is already handled in the database by `PushedFilters:
[EqualTo(a,A)]`.
> {code}
> scala> val t_char = sqlContext.read.option("user", "postgres").option("password",
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new java.util.Properties())
> t_char: org.apache.spark.sql.DataFrame = [a: string]
> scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password",
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new java.util.Properties())
> t_varchar: org.apache.spark.sql.DataFrame = [a: string]
> scala> t_char.show
> +----------+
> |         a|
> +----------+
> |A         |
> |AA        |
> |AAA       |
> +----------+
> scala> t_varchar.show
> +---+
> |  a|
> +---+
> |  A|
> | AA|
> |AAA|
> +---+
> scala> t_char.filter(t_char("a")==="A").show
> +---+
> |  a|
> +---+
> +---+
> scala> t_char.filter(t_char("a")==="A         ").show
> +----------+
> |         a|
> +----------+
> |A         |
> +----------+
> scala> t_varchar.filter(t_varchar("a")==="A").show
> +---+
> |  a|
> +---+
> |  A|
> +---+
> scala> t_char.filter(t_char("a")==="A").explain
> == Physical Plan ==
> Filter (a#0 = A)
> +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message