spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fregly <ch...@fregly.com>
Subject Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?
Date Fri, 25 Dec 2015 14:41:06 GMT
Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver
needs to be on the Java System Classpath per this
<http://spark.apache.org/docs/latest/sql-programming-guide.html#troubleshooting>
troubleshooting
section in the Spark SQL programming guide.

Here
<https://github.com/fluxcapacitor/pipeline/blob/master/bin/start-hive-thriftserver.sh>
is an example hive-thrift-server start script from my Spark-based reference
pipeline project.  Here
<https://github.com/fluxcapacitor/pipeline/blob/master/bin/pipeline-spark-sql.sh>
is an example script that decorates the out-of-the-box spark-sql command to
use the MySQL JDBC driver.

These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined
here
<https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L144>
and
here
<https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L87>
and
includes the path to the local MySQL JDBC driver.  This approach is
described here
<http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
in
the Spark docs that describe the advanced spark-submit options.

Any jar specified with --jars will be passed to each worker node in the
cluster - specifically in the work directory for each SparkContext for
isolation purposes.

Cleanup of these jars on the worker nodes is handled by YARN automatically,
and by Spark Standalone per the spark.worker.cleanup.appDataTtl config
param.

The Spark SQL programming guide says to use SPARK_CLASSPATH for this
purpose, but I couldn't get this to work for whatever reason, so i'm
sticking to the --jars approach used in my examples.

On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuild11@gmail.com> wrote:

> Stephen,
>
> Let me confirm. I just need to propagate these settings I put in
> spark-defaults.conf to all the worker nodes? Do I need to do the same with
> the PostgreSQL driver jar file too? If so, is there a way to have it read
> from HDFS rather than copying out to the cluster manually.
>
> Thanks for your help,
> Ben
>
>
> On Tuesday, December 22, 2015, Stephen Boesch <javadba@gmail.com> wrote:
>
>> HI Benjamin,  yes by adding to the thrift server then the create table
>> would work.  But querying is performed by the workers: so you need to add
>> to the classpath of all nodes for reads to work.
>>
>> 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuild11@gmail.com>:
>>
>>> Hi Stephen,
>>>
>>> I forgot to mention that I added these lines below to the
>>> spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server
>>> running on it. Then, I restarted it.
>>>
>>>
>>> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>>
>>> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>>
>>> I read in another thread that this would work. I was able to create the
>>> table and could see it in my SHOW TABLES list. But, when I try to query the
>>> table, I get the same error. It looks like I’m getting close.
>>>
>>> Are there any other things that I have to do that you can think of?
>>>
>>> Thanks,
>>> Ben
>>>
>>>
>>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch <javadba@gmail.com> wrote:
>>>
>>> The postgres jdbc driver needs to be added to the  classpath of your
>>> spark workers.  You can do a search for how to do that (multiple ways).
>>>
>>> 2015-12-22 17:22 GMT-08:00 b2k70 <bbuild11@gmail.com>:
>>>
>>>> I see in the Spark SQL documentation that a temporary table can be
>>>> created
>>>> directly onto a remote PostgreSQL table.
>>>>
>>>> CREATE TEMPORARY TABLE <table_name>
>>>> USING org.apache.spark.sql.jdbc
>>>> OPTIONS (
>>>> url "jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>",
>>>> dbtable "impressions"
>>>> );
>>>> When I run this against our PostgreSQL server, I get the following
>>>> error.
>>>>
>>>> Error: java.sql.SQLException: No suitable driver found for
>>>> jdbc:postgresql://<PostgreSQL_Hostname_IP>/<database_name>
>>>> (state=,code=0)
>>>>
>>>> Can someone help me understand why this is?
>>>>
>>>> Thanks, Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>>>> <http://nabble.com>.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>>
>>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Mime
View raw message