From Brad Miller <>
Subject Fwd: pyspark crash on mesos
Date Mon, 03 Mar 2014 18:34:07 GMT
Hi All,

After switching from standalone Spark to Mesos I'm experiencing some
instability.  I'm running pyspark interactively through iPython
notebook, and get this crash non-deterministically (although pretty
reliably in the first 2000 tasks, often much sooner).

Exception in thread "DAGScheduler" org.apache.spark.SparkException:
EOF reached before Python server acknowledged
at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:340)
at org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:311)
at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:70)
at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:253)
at org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:251)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95)
at scala.collection.Iterator$class.foreach(Iterator.scala:772)
at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:95)
at org.apache.spark.Accumulators$.add(Accumulators.scala:251)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:662)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:437)
at org.apache.spark.scheduler.DAGScheduler$$anon$

I'm running the following software versions on all machines:
Spark: 0.8.1  (md5: 5d3c56eaf91c7349886d5c70439730b3)
Mesos: 0.13.0  (md5: 220dc9c1db118bc7599d45631da578b9)
Python 2.7.3 (Stackoverflow mentioned differing python versions may be
to blame --- unless Spark or iPython is specifically invoking an older
version under the hood mine are all the same).
Ubuntu 12.0.4

I've modified as follows:
I had problems launching the cluster with and
traced the problem to (what seemed to be) a bug in
which used a "--conf" flag that mesos-slave and mesos-master didn't
recognize.  I removed the flag and instead added code to read in
environment variables from then worked as advertised.

Incase it's helpful, I've attached several files as follows:
*spark_full_output: output of ipython process where SparkContext was created
* mesos config file from slave (identical to
master except for MESOS_MASTER)
* spark config file
*mesos-master.INFO: log file from mesos-master
*mesos-master.WARNING: log file from mesos-master
* my modified version of

Incase anybody from Berkeley is so interested they want to interact
with my deployment, my office is in Soda hall so that can definitely
be arranged.  My apologies if anybody received a duplicate message; I
encountered some delays and complication while joining the list.

-Brad Miller

