spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Wu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-17614) sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not support
Date Wed, 21 Sep 2016 17:04:21 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510525#comment-15510525
] 

Paul Wu commented on SPARK-17614:
---------------------------------

Thanks. I tried to register my custom dialect as following, but it does not reach the getTableExistsQuery()
method. Could anyone help?

import org.apache.spark.sql.jdbc.JdbcDialect;

public class NRSCassandraDialect  extends JdbcDialect {
    
    @Override
    public boolean canHandle(String url) {
        System.out.println("came here.."+ url.startsWith("jdbc:cassandra"));
        return url.startsWith("jdbc:cassandra");
    }
    @Override
    public String getTableExistsQuery (String table) {
        System.out.println("query?");
        return "SELECT * from " + table + " LIMIT 1";
    }
}

--------------------------------------------------------------
public class CassJDBC implements Serializable {

    private static final org.apache.log4j.Logger LOGGER = org.apache.log4j.Logger.getLogger(CassJDBC.class);

    private static final String _CONNECTION_URL = "jdbc:cassandra://ulpd326.****.com/test?loadbalancing=DCAwareRoundRobinPolicy(%22datacenter1%22)";
    private static final String _USERNAME = "";
    private static final String _PWD = "";
    
    private static final SparkSession sparkSession
            = SparkSession.builder() .config("spark.sql.warehouse.dir", "file:///home/zw251y/tmp").master("local[*]").appName("Spark2JdbcDs").getOrCreate();

    public static void main(String[] args) {
       
        JdbcDialects.registerDialect(new NRSCassandraDialect());
        final Properties connectionProperties = new Properties();
     
        final String dbTable= "sql_demo";
        
        Dataset<Row> jdbcDF
                = sparkSession.read()
                .jdbc(_CONNECTION_URL, dbTable, connectionProperties);

        jdbcDF.show();
       
    }
}
--------------------

Error message:
came here..true
parameters = "datacenter1"
Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError:
line 1:29 no viable alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
	at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.<init>(CassandraPreparedStatement.java:108)
	at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
	at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
	at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)

> sparkSession.read() .jdbc(***) use the sql syntax "where 1=0" that Cassandra does not
support
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17614
>                 URL: https://issues.apache.org/jira/browse/SPARK-17614
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Any Spark Runtime 
>            Reporter: Paul Wu
>            Priority: Minor
>              Labels: cassandra-jdbc, sql
>
> I have the code like the following with Cassandra JDBC (https://github.com/adejanovski/cassandra-jdbc-wrapper):
>  final String dbTable= "sql_demo";    
>         Dataset<Row> jdbcDF
>                 = sparkSession.read()
>                 .jdbc(CASSANDRA_CONNECTION_URL, dbTable, connectionProperties);
>         List<Row> rows = jdbcDF.collectAsList();
> It threw the error:
> Exception in thread "main" java.sql.SQLTransientException: com.datastax.driver.core.exceptions.SyntaxError:
line 1:29 no viable alternative at input '1' (SELECT * FROM sql_demo WHERE [1]...)
> 	at com.github.adejanovski.cassandra.jdbc.CassandraPreparedStatement.<init>(CassandraPreparedStatement.java:108)
> 	at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:371)
> 	at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:348)
> 	at com.github.adejanovski.cassandra.jdbc.CassandraConnection.prepareStatement(CassandraConnection.java:48)
> The reason is that the Spark jdbc code uses the sql syntax "where 1=0" somewhere (to
get the schema?), but Cassandra does not support this syntax. Not sure how this issue can
be resolved...this is because CQL is not standard sql. 
> The following log shows more information:
> 16/09/20 13:16:35 INFO CassandraConnection  138: Datacenter: %s; Host: %s; Rack: %s
> 16/09/20 13:16:35 TRACE CassandraPreparedStatement  98: CQL: SELECT * FROM sql_demo WHERE
1=0
> 16/09/20 13:16:35 TRACE RequestHandler  71: [19400322] com.datastax.driver.core.Statement$1@41ccb3b9
> 16/09/20 13:16:35 TRACE RequestHandler  272: [19400322-1] Starting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message