Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36F2718B89 for ; Fri, 31 Jul 2015 07:51:05 +0000 (UTC) Received: (qmail 92907 invoked by uid 500); 31 Jul 2015 07:51:05 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 92885 invoked by uid 500); 31 Jul 2015 07:51:05 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 92824 invoked by uid 99); 31 Jul 2015 07:51:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Jul 2015 07:51:05 +0000 Date: Fri, 31 Jul 2015 07:51:05 +0000 (UTC) From: "Sebastian YEPES FERNANDEZ (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian YEPES FERNANDEZ updated SPARK-9503: --------------------------------------------- Description: Hello, I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 {code:title=log|borderStyle=solid} 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 Exception in thread "Thread-1647" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000' 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED {code} A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 {code:title=log|borderStyle=solid} 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at http://192.168.0.254:8081 I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at master@192.168.0.254:5050 I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. Attempting to register without authentication I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework attempted to re-register' I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed framework attempted to re-register I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0038' 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED 15/07/31 09:55:47 INFO Utils: Shutdown hook called {code} I can get around this by removing the zk data: {code:title=zkCli.sh|borderStyle=solid} rmr /spark_mesos_dispatcher {code} was: Hello, I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 {code:title=log|borderStyle=solid} 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 Exception in thread "Thread-1647" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000' 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED {code} A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 I can get around this by removing the zk data: {code:title=zkCli.sh|borderStyle=solid} rmr /spark_mesos_dispatcher {code} > Mesos dispatcher NullPointerException (MesosClusterScheduler) > ------------------------------------------------------------- > > Key: SPARK-9503 > URL: https://issues.apache.org/jira/browse/SPARK-9503 > Project: Spark > Issue Type: Bug > Components: Mesos > Affects Versions: 1.4.1 > Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83 > Reporter: Sebastian YEPES FERNANDEZ > Labels: mesosphere > > Hello, > I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's > By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" > https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 > {code:title=log|borderStyle=solid} > 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 > Exception in thread "Thread-1647" java.lang.NullPointerException > at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) > at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) > at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) > at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) > I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver > I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000' > 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED > {code} > A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 > {code:title=log|borderStyle=solid} > 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at http://192.168.0.254:8081 > I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 > I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at master@192.168.0.254:5050 > I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. Attempting to register without authentication > I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework attempted to re-register' > I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver > 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed framework attempted to re-register > I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0038' > 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED > 15/07/31 09:55:47 INFO Utils: Shutdown hook called > {code} > I can get around this by removing the zk data: > {code:title=zkCli.sh|borderStyle=solid} > rmr /spark_mesos_dispatcher > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org