spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian YEPES FERNANDEZ (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)
Date Fri, 31 Jul 2015 07:51:05 GMT

     [ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sebastian YEPES FERNANDEZ updated SPARK-9503:
---------------------------------------------
    Description: 
Hello,

I have just started using start-mesos-dispatcher and have been noticing that some random crashes
NPE's

By looking at the exception it looks like in certain situations the "queuedDrivers" is empty
and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on
port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start because its
already registered #SPARK-7831
{code:title=log|borderStyle=solid}
15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at http://192.168.0.254:8081
I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at master@192.168.0.254:5050
I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. Attempting to register
without authentication
I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework attempted to re-register'
I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed framework attempted
to re-register
I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0038'
15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED
15/07/31 09:55:47 INFO Utils: Shutdown hook called
{code}

I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}


  was:
Hello,

I have just started using start-mesos-dispatcher and have been noticing that some random crashes
NPE's

By looking at the exception it looks like in certain situations the "queuedDrivers" is empty
and causes the NPE "submission.cores"

https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516

{code:title=log|borderStyle=solid}
15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on
port 7077
Exception in thread "Thread-1647" java.lang.NullPointerException
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
        at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000'
15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED
{code}

A side effect of this NPE is that after the crash the dispatcher will not start because its
already registered #SPARK-7831
I can get around this by removing the zk data:
{code:title=zkCli.sh|borderStyle=solid}
rmr /spark_mesos_dispatcher
{code}



> Mesos dispatcher NullPointerException (MesosClusterScheduler)
> -------------------------------------------------------------
>
>                 Key: SPARK-9503
>                 URL: https://issues.apache.org/jira/browse/SPARK-9503
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 1.4.1
>         Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
>            Reporter: Sebastian YEPES FERNANDEZ
>              Labels: mesosphere
>
> Hello,
> I have just started using start-mesos-dispatcher and have been noticing that some random
crashes NPE's
> By looking at the exception it looks like in certain situations the "queuedDrivers" is
empty and causes the NPE "submission.cores"
> https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516
> {code:title=log|borderStyle=solid}
> 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications
on port 7077
> Exception in thread "Thread-1647" java.lang.NullPointerException
>         at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
>         at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
>         at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
> I0731 00:53:52.969518  7014 sched.cpp:1625] Asked to abort the driver
> I0731 00:53:52.969895  7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000'
> 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED
> {code}
> A side effect of this NPE is that after the crash the dispatcher will not start because
its already registered #SPARK-7831
> {code:title=log|borderStyle=solid}
> 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at http://192.168.0.254:8081
> I0731 09:55:47.715039  8162 sched.cpp:157] Version: 0.23.0
> I0731 09:55:47.717013  8163 sched.cpp:254] New master detected at master@192.168.0.254:5050
> I0731 09:55:47.717381  8163 sched.cpp:264] No credentials provided. Attempting to register
without authentication
> I0731 09:55:47.718246  8177 sched.cpp:819] Got error 'Completed framework attempted to
re-register'
> I0731 09:55:47.718268  8177 sched.cpp:1625] Asked to abort the driver
> 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed framework attempted
to re-register
> I0731 09:55:47.719091  8177 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0038'
> 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED
> 15/07/31 09:55:47 INFO Utils: Shutdown hook called
> {code}
> I can get around this by removing the zk data:
> {code:title=zkCli.sh|borderStyle=solid}
> rmr /spark_mesos_dispatcher
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message