spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-14204) [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
Date Sat, 30 Jul 2016 13:27:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400668#comment-15400668
] 

Apache Spark commented on SPARK-14204:
--------------------------------------

User 'mchalek' has created a pull request for this issue:
https://github.com/apache/spark/pull/14420

> [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-14204
>                 URL: https://issues.apache.org/jira/browse/SPARK-14204
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Kevin McHale
>            Assignee: Kevin McHale
>              Labels: JDBC, SQL
>             Fix For: 1.6.2
>
>
> DataFrameReader JDBC methods throw an IllegalStateException when:
>   1. the JDBC driver is contained in a user-provided jar, and
>   2. the user does not specify which driver to use, but rather allows spark to determine
the driver from the JDBC URL.
> This broke some of our database ETL jobs at @premisedata when we upgraded from 1.6.0
to 1.6.1.
> I have tracked the problem down to a regression introduced in the fix for SPARK-12579:
https://github.com/apache/spark/commit/7f37c1e45d52b7823d566349e2be21366d73651f#diff-391379a5ec51082e2ae1209db15c02b3R53
> The issue is that DriverRegistry.register is not called on the executors for a JDBC driver
that is derived from the JDBC path.
> The problem can be demonstrated within spark-shell, provided you're in cluster mode and
you've deployed a JDBC driver (e.g. postgresql.Driver) via the --jars argument:
> {code}
> import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.createConnectionFactory
> val factory = createConnectionFactory("jdbc:postgresql://whatever.you.want/database?user=user&password=password",
new java.util.Properties)
> sc.parallelize(1 to 100).foreach { _ => factory() } // throws exception
> {code}
> A sufficient fix is to apply DriverRegistry.register to the `driverClass` variable, rather
than to `userSpecifiedDriverClass`, at the code link provided above.  I will submit a PR for
this shortly.
> In the meantime, a temporary workaround is to manually specify the JDBC driver class
in the Properties object passed to DataFrameReader.jdbc, or in the options used in other entry
points, which will force the executors to register the class properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message