spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-5235) org.apache.spark.sql.SQLConf
Date Wed, 14 Jan 2015 11:34:34 GMT


Sean Owen commented on SPARK-5235:

Is {{SQLContext}} really supposed to be {{Serializable}} to begin with? Given that it contains
a {{SparkContext}} it doesn't seem like it's supposed to be, since that class is not. I know
the {{SparkContext}} is {{@transient}} so is this another case of making this {{Serializable}}
to work around people incorrectly getting the {{SQLContext}} into a function closure? Given
that a serialized {{SQLContext}} doesn't appear to work (the {{SparkContext}} is {{null}}),
I suppose I'm just wondering out loud whether it's a good idea to push this farther rather
than having people write functions that don't serialize {{SQLContext}}, since it won't actually
be usable. For example, I assume the problem here can be fixed by just not serializing {{SQLContext}}.
Right? or do I misunderstand the purpose.

> org.apache.spark.sql.SQLConf
> --------------------------------------------------------------
>                 Key: SPARK-5235
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Alex Baretta
> The SQLConf field in SQLContext is neither Serializable nor transient. Here's the stack
trace I get when running SQL queries against a Parquet file.
> Exception in thread "Thread-43" org.apache.spark.SparkException: Job aborted due to stage
failure: Task not serializable: org.apache.spark.sql.SQLConf
>         at$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1195)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1184)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1183)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1183)
>         at$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:843)
>         at$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:779)
>         at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:763)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364)
>         at$class.aroundReceive(Actor.scala:465)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1356)
>         at
>         at
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>         at
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
>         at

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message