spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Rahn (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-9182) filters (where clause) on DataFrames are not passed through to jdbc source
Date Mon, 20 Jul 2015 22:19:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634163#comment-14634163
] 

Greg Rahn commented on SPARK-9182:
----------------------------------

Also looks like groupBy is not being pushed either.
Running
{code}
emp.groupBy("job").agg(count(emp("job"))).show()
{code}
Results in
{code}
LOG:  execute <unnamed>: SELECT "job" FROM emp
{code}

> filters (where clause) on DataFrames are not passed through to jdbc source
> --------------------------------------------------------------------------
>
>                 Key: SPARK-9182
>                 URL: https://issues.apache.org/jira/browse/SPARK-9182
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>            Reporter: Greg Rahn
>
> When running all of these API calls, the only one that passes the filter through to the
backend jdbc source is equality.  All filters in these commands should be able to be passed
through to the jdbc database source.
> {code}
> val url="jdbc:postgresql:grahn"
> val prop = new java.util.Properties
> val emp = sqlContext.read.jdbc(url, "emp", prop)
> emp.filter(emp("sal") === 5000).show()
> emp.filter(emp("sal") < 5000).show()
> emp.filter("sal = 3000").show()
> emp.filter("sal > 2500").show()
> emp.filter("sal >= 2500").show()
> emp.filter("sal < 2500").show()
> emp.filter("sal <= 2500").show()
> emp.filter("sal != 3000").show()
> emp.filter("sal between 3000 and 5000").show()
> emp.filter("ename in ('SCOTT','BLAKE')").show()
> {code}
> We see from the PostgreSQL query log the following is run, and see that only equality
predicates are passed through.
> {code}
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp WHERE sal = 5000
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp WHERE sal = 3000
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message