spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "KaiXinXIaoLei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-22760) where driver is stopping, and some executors lost because of YarnSchedulerBackend.stop, then there is a problem,
Date Tue, 12 Dec 2017 01:29:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-22760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

KaiXinXIaoLei updated SPARK-22760:
----------------------------------
    Description: 
Use SPARK-14228 , i find a problem:

17/12/11 22:38:33 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Executor for container
container_e02_1509517131757_0001_01_000002 exited because of a YARN event (e.g., pre-emption)
and not because of an error in the running job.
17/12/11 22:38:33 ERROR YarnClientSchedulerBackend: Could not find CoarseGrainedScheduler
or it has been stopped.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped.
        at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
        at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:128)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:231)
        at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:515)
        at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:62)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:392)
        at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:259)
        at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)

I analysis this reason. When the number of executors is big, and YarnSchedulerBackend.stopped=False
after YarnSchedulerBackend.stop() is running, some executor is stoped, and YarnSchedulerBackend.onDisconnected()
will be called, then the problem is exists


  was:
When the number of executors is big, and YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop()
is running, some executor is stoped, and YarnSchedulerBackend.onDisconnected() will be called,
then the problem is exists



> where driver is stopping, and some executors lost because of YarnSchedulerBackend.stop,
then there is a problem, 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22760
>                 URL: https://issues.apache.org/jira/browse/SPARK-22760
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, YARN
>    Affects Versions: 2.2.1
>            Reporter: KaiXinXIaoLei
>
> Use SPARK-14228 , i find a problem:
> 17/12/11 22:38:33 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Executor for container
container_e02_1509517131757_0001_01_000002 exited because of a YARN event (e.g., pre-emption)
and not because of an error in the running job.
> 17/12/11 22:38:33 ERROR YarnClientSchedulerBackend: Could not find CoarseGrainedScheduler
or it has been stopped.
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been
stopped.
>         at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:163)
>         at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:128)
>         at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:231)
>         at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:515)
>         at org.apache.spark.rpc.RpcEndpointRef.ask(RpcEndpointRef.scala:62)
>         at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:392)
>         at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receive$1.applyOrElse(YarnSchedulerBackend.scala:259)
>         at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
>         at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> I analysis this reason. When the number of executors is big, and YarnSchedulerBackend.stopped=False
after YarnSchedulerBackend.stop() is running, some executor is stoped, and YarnSchedulerBackend.onDisconnected()
will be called, then the problem is exists



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message