spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-18593) JDBCRDD fails to filter CHAR type column for PostgreSQL
Date Sun, 27 Nov 2016 10:05:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-18593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun updated SPARK-18593:
----------------------------------
    Description: 
In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on CHAR
column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded string`
for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false. Spark
2.0.0 removes the post filter because it is already handled in the database by `PushedFilters:
[EqualTo(a,A)]`.

{code}
scala> val t_char = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres",
"t_char", new java.util.Properties())
t_char: org.apache.spark.sql.DataFrame = [a: string]

scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password", "rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres",
"t_varchar", new java.util.Properties())
t_varchar: org.apache.spark.sql.DataFrame = [a: string]

scala> t_char.show
+----------+
|         a|
+----------+
|A         |
|AA        |
|AAA       |
+----------+


scala> t_varchar.show
+---+
|  a|
+---+
|  A|
| AA|
|AAA|
+---+


scala> t_char.filter(t_char("a")==="A").show
+---+
|  a|
+---+
+---+


scala> t_char.filter(t_char("a")==="A         ").show
+----------+
|         a|
+----------+
|A         |
+----------+


scala> t_varchar.filter(t_varchar("a")==="A").show
+---+
|  a|
+---+
|  A|
+---+


scala> t_char.filter(t_char("a")==="A").explain
== Physical Plan ==
Filter (a#0 = A)
+- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
{code}

  was:
when using filter using jdbc.. unable to retrieve data if the column value is having only
value of length one. 

for example:  
Below code works because the filter condition data is of length > 1
val dataFromTable = sqlContext.read.jdbc("jdbc:postgresql://localhost:5433/postgres", "stocknames",
defaultProps)
  dataFromTable.filter(dataFromTable("grouptype")==="ZP").show()

result: 

+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+
|securitycode|securityid|        securityname|status|grouptype|facevalue|      isinno|   
        industry|instrument|
+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+
|      504370|  ZARDIINV|ARDI INVESTMENT &...|Active|       ZP|     10.0|            |Misc.Commercial
S...|    Equity|
|      512221|     TARCF|TARRIF CINE &amp;...|Active|       ZP|     10.0|INE177E01010|Finance
(includin...|    Equity|
|      512505|    MEENST|MEENAKSHI STEEL I...|Active|       ZP|     10.0|            |Iron
&amp; Steel ...|    Equity|
|      514060|   EVERTEX|Evergreen Textile...|Active|       ZP|     10.0|INE229N01010|   
        Textiles|    Equity|
|      526085|   SGARRES|SAGAR TOURIST RES...|Active|       ZP|     10.0|NA          |   
          Hotels|    Equity|
+------------+----------+--------------------+------+---------+---------+------------+--------------------+----------+

But if the same filter condition code is of length 1, no data is filtered. 

eg: 

 val dataFromTable = sqlContext.read.jdbc("jdbc:postgresql://localhost:5433/postgres", "stocknames",
defaultProps)
  dataFromTable.filter(dataFromTable("grouptype")==="A").show()

not getting any output

+------------+----------+------------+------+---------+---------+------+--------+----------+
|securitycode|securityid|securityname|status|grouptype|facevalue|isinno|industry|instrument|
+------------+----------+------------+------+---------+---------+------+--------+----------+
+------------+----------+------------+------+---------+---------+------+--------+----------+


> JDBCRDD fails to filter CHAR type column for PostgreSQL
> -------------------------------------------------------
>
>                 Key: SPARK-18593
>                 URL: https://issues.apache.org/jira/browse/SPARK-18593
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 1.6.3
>            Reporter: Durga Prasad Gunturu
>            Priority: Minor
>
> In Apache Spark 1.6.x, JDBCRDD returns incorrect results for a query with filters on
CHAR column with PostgreSQL CHAR type. The root cause is PostgreSQL returns `space padded
string` for a result. So, the post processing filter `Filter (a#0 = A)` is evaluated false.
Spark 2.0.0 removes the post filter because it is already handled in the database by `PushedFilters:
[EqualTo(a,A)]`.
> {code}
> scala> val t_char = sqlContext.read.option("user", "postgres").option("password",
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_char", new java.util.Properties())
> t_char: org.apache.spark.sql.DataFrame = [a: string]
> scala> val t_varchar = sqlContext.read.option("user", "postgres").option("password",
"rootpass").jdbc("jdbc:postgresql://localhost:5432/postgres", "t_varchar", new java.util.Properties())
> t_varchar: org.apache.spark.sql.DataFrame = [a: string]
> scala> t_char.show
> +----------+
> |         a|
> +----------+
> |A         |
> |AA        |
> |AAA       |
> +----------+
> scala> t_varchar.show
> +---+
> |  a|
> +---+
> |  A|
> | AA|
> |AAA|
> +---+
> scala> t_char.filter(t_char("a")==="A").show
> +---+
> |  a|
> +---+
> +---+
> scala> t_char.filter(t_char("a")==="A         ").show
> +----------+
> |         a|
> +----------+
> |A         |
> +----------+
> scala> t_varchar.filter(t_varchar("a")==="A").show
> +---+
> |  a|
> +---+
> |  A|
> +---+
> scala> t_char.filter(t_char("a")==="A").explain
> == Physical Plan ==
> Filter (a#0 = A)
> +- Scan JDBCRelation(jdbc:postgresql://localhost:5432/postgres,t_char,[Lorg.apache.spark.Partition;@2f65c341,{user=postgres,
password=rootpass})[a#0] PushedFilters: [EqualTo(a,A)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message