spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markhamstra <...@git.apache.org>
Subject [GitHub] spark pull request: SPARK-1235: kill the application when DAGSched...
Date Mon, 31 Mar 2014 00:20:39 GMT
Github user markhamstra commented on a diff in the pull request:

    https://github.com/apache/spark/pull/186#discussion_r11101486
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -116,21 +119,30 @@ class DAGScheduler(
       private val metadataCleaner =
         new MetadataCleaner(MetadataCleanerType.DAG_SCHEDULER, this.cleanup, env.conf)
     
    -  taskScheduler.setDAGScheduler(this)
    -
       /**
    -   * Starts the event processing actor.  The actor has two responsibilities:
    -   *
    -   * 1. Waits for events like job submission, task finished, task failure etc., and calls
    -   *    [[org.apache.spark.scheduler.DAGScheduler.processEvent()]] to process them.
    -   * 2. Schedules a periodical task to resubmit failed stages.
    -   *
    -   * NOTE: the actor cannot be started in the constructor, because the periodical task
references
    -   * some internal states of the enclosing [[org.apache.spark.scheduler.DAGScheduler]]
object, thus
    -   * cannot be scheduled until the [[org.apache.spark.scheduler.DAGScheduler]] is fully
constructed.
    +   * Starts the event processing actor within the supervisor.  The eventProcessingActor
    +   * waits for events like job submission, task finished, task failure etc., and calls
    +   * [[org.apache.spark.scheduler.DAGScheduler.processEvent()]] to process them.
        */
    -  def start() {
    -    eventProcessActor = env.actorSystem.actorOf(Props(new Actor {
    +  env.actorSystem.actorOf(Props(new Actor {
    +
    +    override val supervisorStrategy =
    +      OneForOneStrategy() {
    +        case x: Exception => {
    +          logError("eventProcesserActor failed due to the error %s; shutting down SparkContext"
    +            .format(x.getMessage))
    +          doCancelAllJobs()
    +          sc.stop()
    +          Stop
    --- End diff --
    
    Right, which may be enough as long as all we are trying to accomplish is a clean shutdown
of the whole system, not restarting all or part of it while retaining state and partial results
from running jobs.  TaskManager's messages won't go anywhere but the /deadLetters synthetic
actor, but I think that's fine as long as we avoid throwing uncaught exceptions etc. while
trying to shutdown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message