spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandru Barbulescu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-27623) Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
Date Thu, 02 May 2019 14:16:00 GMT
Alexandru Barbulescu created SPARK-27623:
--------------------------------------------

             Summary: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
                 Key: SPARK-27623
                 URL: https://issues.apache.org/jira/browse/SPARK-27623
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.2
            Reporter: Alexandru Barbulescu


After updating to spark 2.4.2 when using the 
{code:java}
spark.read.format().options().load()
{code}
 

chain of methods, regardless of what parameter is passed to "format" we get the following
error related to avro:

 
{code:java}
- .options(**load_options)
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 172, in load
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
- File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
- File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
- py4j.protocol.Py4JJavaError: An error occurred while calling o69.load.
- : java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister:
Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
- at java.util.ServiceLoader.fail(ServiceLoader.java:232)
- at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
- at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
- at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
- at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
- at scala.collection.Iterator.foreach(Iterator.scala:941)
- at scala.collection.Iterator.foreach$(Iterator.scala:941)
- at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
- at scala.collection.IterableLike.foreach(IterableLike.scala:74)
- at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
- at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
- at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:250)
- at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:248)
- at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
- at scala.collection.TraversableLike.filter(TraversableLike.scala:262)
- at scala.collection.TraversableLike.filter$(TraversableLike.scala:262)
- at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
- at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
- at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:498)
- at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
- at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
- at py4j.Gateway.invoke(Gateway.java:282)
- at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
- at py4j.commands.CallCommand.execute(CallCommand.java:79)
- at py4j.GatewayConnection.run(GatewayConnection.java:238)
- at java.lang.Thread.run(Thread.java:748)
- Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/FileFormat$class
- at org.apache.spark.sql.avro.AvroFileFormat.<init>(AvroFileFormat.scala:44)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
- at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
- at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
- at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
- at java.lang.Class.newInstance(Class.java:442)
- at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
- ... 29 more
- Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.FileFormat$class
- at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
- ... 36 more

{code}
 

The code we run looks like this:

 
{code:java}
spark_session = (
 SparkSession.builder
 .appName(APPLICATION_NAME)
 .master(MASTER_URL)
 .config('spark.cassandra.connection.host', SERVER_IP_ADDRESS)
 .config('spark.cassandra.auth.username', CASSANDRA_USERNAME)
 .config('spark.cassandra.auth.password', CASSANDRA_PASSWORD)
 .config('spark.sql.shuffle.partitions', 16)
 .config('parquet.enable.summary-metadata', 'true')
 .getOrCreate())


 load_options = {
 'keyspace': CASSANDRA_KEYSPACE,
 'table': TABLE_NAME,
 'spark.cassandra.input.fetch.size_in_rows': '150' }


 df = (spark_session.read.format('org.apache.spark.sql.cassandra')
 .options(**load_options)
 .load())
{code}
 

We get the exact same error when trying to read a local .avro file instead of from Cassandra.

Up to now we included the .jar file for Spark-Avro using the spark-submit --jars option. The
version of Spark-Avro that we used, and worked with Spark 2.4.1, was Spark-Avro 2.4.0.

In an attempt to fix this problem we tried updating the .jar file version. We also tried using
the --packages option, with different version combinations, but none of these solutions worked.
The same error shows up every time. 

When rolling back to Spark 2.4.1 with the exact same setup and code, the error doesn't show
up and everything works fine. 

Any ideas on what could be causing this?

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message