I am using Cassandra-Spark connector to pull data from Cassandra, process
it and write it back to Cassandra.
Now I am getting the following exception, and apparently it is Kryo
serialisation. Does anyone what is the reason and how this can be solved?
I also tried to register "org.apache.spark.sql.cassandra.CassandraSQLRow"
in "kryo.register" , but even this did not solve the problem and exception
remains.
WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 7,
ip-X-Y-Z): com.esotericsoftware.kryo.KryoException: Unable to find class:
org.apache.spark.sql.cassandra.CassandraSQLRow
Serialization trace:
_2 (org.apache.spark.util.MutablePair)
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:599)
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1171)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1218)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
I am using Spark 1.1.0 with cassandra-spark connector 1.1.0 , here is the
build:
"org.apache.spark" % "spark-mllib_2.10" % "1.1.0"
exclude("com.google.guava", "guava"),
"com.google.guava" % "guava" % "16.0" % "provided",
"com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0"
exclude("com.google.guava", "guava") withSources() withJavadoc(),
"org.apache.cassandra" % "cassandra-all" % "2.1.1"
exclude("com.google.guava", "guava") ,
"org.apache.cassandra" % "cassandra-thrift" % "2.1.1"
exclude("com.google.guava", "guava") ,
"com.datastax.cassandra" % "cassandra-driver-core" % "2.1.2"
exclude("com.google.guava", "guava") ,
"org.apache.spark" %% "spark-core" % "1.1.0" % "provided"
exclude("com.google.guava", "guava") exclude("org.apache.hadoop", "hadoop
-core"),
"org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided"
exclude("com.google.guava", "guava"),
"org.apache.spark" %% "spark-catalyst" % "1.1.0" % "provided"
exclude("com.google.guava", "guava") exclude("org.apache.spark",
"spark-core"),
"org.apache.spark" %% "spark-sql" % "1.1.0" % "provided"
exclude("com.google.guava", "guava") exclude("org.apache.spark",
"spark-core"),
"org.apache.spark" %% "spark-hive" % "1.1.0" % "provided"
exclude("com.google.guava", "guava") exclude("org.apache.spark",
"spark-core"),
"org.apache.hadoop" % "hadoop-client" % "1.0.4" % "provided",
best,
/Shahab
|