spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-5235) org.apache.spark.sql.SQLConf
Date Wed, 14 Jan 2015 16:52:35 GMT


Sean Owen commented on SPARK-5235:

[~alexbaretta] It certainly may not be your code of course. I mean "people" including the
Spark code. But surely the problem is solved exactly by not trying to serialize {{SQLContext}},
no? despite its declaration, as you've demonstrated, it does not serialize, and was not designed
to be used after serialization given the {{@transient}} field. 

You've suggested a reasonable band-aid on a band-aid but I would either like to fix the cause
or understand why it's actually supposed to act this way. Other Contexts in Spark are not
supposed to be serialized. Where I've seen this pattern before in the unit tests, it was certainly
a hack for convenience that didn't matter much because it was just a test. 

Can you run with {{}}? this will show exactly
what had the reference to {{SQLContext}}.

> org.apache.spark.sql.SQLConf
> --------------------------------------------------------------
>                 Key: SPARK-5235
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Alex Baretta
> The SQLConf field in SQLContext is neither Serializable nor transient. Here's the stack
trace I get when running SQL queries against a Parquet file.
> Exception in thread "Thread-43" org.apache.spark.SparkException: Job aborted due to stage
failure: Task not serializable: org.apache.spark.sql.SQLConf
>         at$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1195)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1184)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1183)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1183)
>         at$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:843)
>         at$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:779)
>         at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:763)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364)
>         at$class.aroundReceive(Actor.scala:465)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1356)
>         at
>         at
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>         at
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
>         at

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message