spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Hubbard (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM
Date Sat, 16 Jan 2016 21:30:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103446#comment-15103446
] 

Jason Hubbard commented on SPARK-2243:
--------------------------------------

The difference between N contexts in N JVMs and N contexts in 1 JVM is N-1 JVMs, N-1 resource
manager application requests, and N-1 application masters, plus all overhead and complexity
associated with this.  There are trade-offs on both approaches, but I think it is important
to allow the designer to weigh and decide on each option.

I guess I'm not familiar with the app-container like things you are referring, but typically
most applications I see make these assumptions at the task level and not necessarily at the
driver level.  Again, it is trade-offs that would be great to allow the designer to weigh
IMHO.

> Support multiple SparkContexts in the same JVM
> ----------------------------------------------
>
>                 Key: SPARK-2243
>                 URL: https://issues.apache.org/jira/browse/SPARK-2243
>             Project: Spark
>          Issue Type: New Feature
>          Components: Block Manager, Spark Core
>    Affects Versions: 0.7.0, 1.0.0, 1.1.0
>            Reporter: Miguel Angel Fernandez Diaz
>
> We're developing a platform where we create several Spark contexts for carrying out different
calculations. Is there any restriction when using several Spark contexts? We have two contexts,
one for Spark calculations and another one for Spark Streaming jobs. The next error arises
when we first execute a Spark calculation and, once the execution is finished, a Spark Streaming
job is launched:
> {code}
> 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0
> java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
> 	at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
> 	at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> 	at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
> 	at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
> 	at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> 	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
> 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to java.io.FileNotFoundException
> java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
> 	at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
> 	at org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> 	at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
> 	at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
> 	at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> 	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
> 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 14/06/23 16:40:08 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 1 times; aborting
job
> 14/06/23 16:40:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool 
> 14/06/23 16:40:08 INFO scheduler.DAGScheduler: Failed to run runJob at NetworkInputTracker.scala:182
> [WARNING] 
> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent
failure: Exception failure: java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> 	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> 	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/06/23 16:40:09 INFO dstream.ForEachDStream: metadataCleanupDelay = 3600
> {code}
> So far, we are working on localhost. Any clue about where this error is coming from?
Any workaround to solve the issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message