spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-16522) [MESOS] Spark application throws exception on exit
Date Thu, 14 Jul 2016 19:07:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin updated SPARK-16522:
--------------------------------
    Description: 
Spark applications running on Mesos throw exception upon exit as follows:
{noformat}
16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor
finished with state FINISHED)] in 3 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
	... 4 more
Exception in thread "Thread-47" org.apache.spark.SparkException: Error notifying standalone
scheduler's driver endpoint
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Error sending message [message = RemoveExecutor(1,Executor
finished with state FINISHED)]
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
	... 2 more
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
	... 4 more
Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
	... 4 more
{noformat}

Applications' result is not affected by this error.

This issue can be simply reproduced by launching a spark-shell, and exit after running the
following commands:
{code}
val rdd = sc.parallelize(1 to 10, 10)
rdd.map { _ + 1} collect
{code}

The root cause is that in SparkContext.stop(), MesosCoarseGrainedSchedulerBackend.stop() calls
CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop executors and also
stop the driver endpoint without waiting for the actual stop of executors. MesosCoarseGrainedSchedulerBackend.stop()
still waits for the executors to stop in a timeout. During the wait, MesosCoarseGrainedSchedulerBackend.statusUpdate()
generally will be called to update executors' status, and in turn removeExecutor() is called.
But at that time, the driver endpoint is not available.

  was:
Spark applications running on Mesos throw exception upon exit as follows:
{panel}
16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor
finished with state FINISHED)] in 3 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
	... 4 more
Exception in thread "Thread-47" org.apache.spark.SparkException: Error notifying standalone
scheduler's driver endpoint
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Error sending message [message = RemoveExecutor(1,Executor
finished with state FINISHED)]
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
	... 2 more
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
	... 4 more
Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
	... 4 more
{panel}

Applications' result is not affected by this error.

This issue can be simply reproduced by launching a spark-shell, and exit after running the
following commands:
{code}
val rdd = sc.parallelize(1 to 10, 10)
rdd.map { _ + 1} collect
{code}

The root cause is that in SparkContext.stop(), MesosCoarseGrainedSchedulerBackend.stop() calls
CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop executors and also
stop the driver endpoint without waiting for the actual stop of executors. MesosCoarseGrainedSchedulerBackend.stop()
still waits for the executors to stop in a timeout. During the wait, MesosCoarseGrainedSchedulerBackend.statusUpdate()
generally will be called to update executors' status, and in turn removeExecutor() is called.
But at that time, the driver endpoint is not available.


> [MESOS] Spark application throws exception on exit
> --------------------------------------------------
>
>                 Key: SPARK-16522
>                 URL: https://issues.apache.org/jira/browse/SPARK-16522
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 2.0.0
>            Reporter: Sun Rui
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(1,Executor
finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
> 	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
> 	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
> 	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
> 	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
> 	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
> 	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
> 	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
> 	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
> 	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
> 	... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error notifying standalone
scheduler's driver endpoint
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
> 	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
> 	at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = RemoveExecutor(1,Executor
finished with state FINISHED)]
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
> 	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
> 	... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
> 	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
> 	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
> 	at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
> 	at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
> 	... 4 more
> Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
> 	at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
> 	at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
> 	at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
> 	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
> 	at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
> 	... 4 more
> {noformat}
> Applications' result is not affected by this error.
> This issue can be simply reproduced by launching a spark-shell, and exit after running
the following commands:
> {code}
> val rdd = sc.parallelize(1 to 10, 10)
> rdd.map { _ + 1} collect
> {code}
> The root cause is that in SparkContext.stop(), MesosCoarseGrainedSchedulerBackend.stop()
calls CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop executors and
also stop the driver endpoint without waiting for the actual stop of executors. MesosCoarseGrainedSchedulerBackend.stop()
still waits for the executors to stop in a timeout. During the wait, MesosCoarseGrainedSchedulerBackend.statusUpdate()
generally will be called to update executors' status, and in turn removeExecutor() is called.
But at that time, the driver endpoint is not available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message