spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yanbo Liang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-9089) Failing to run simple job on Spark Standalone Cluster
Date Fri, 28 Aug 2015 07:01:45 GMT

     [ https://issues.apache.org/jira/browse/SPARK-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yanbo Liang updated SPARK-9089:
-------------------------------
    Issue Type: Bug  (was: Question)

> Failing to run simple job on Spark Standalone Cluster
> -----------------------------------------------------
>
>                 Key: SPARK-9089
>                 URL: https://issues.apache.org/jira/browse/SPARK-9089
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.4.0
>         Environment: Staging
>            Reporter: Amar Goradia
>            Priority: Critical
>
> We are trying out Spark and as part of that, we have setup Standalone Spark Cluster.
As part of testing things out, we simple open PySpark shell and ran this simple job: a=sc.parallelize([1,2,3]).count()
> As a result, we are getting errors. We tried googling around this error but haven't been
able to find exact reasoning behind why we are running into this state. Can somebody please
help us further look into this issue and advise us on what we are missing here?
> Here is full error stack:
> >>> a=sc.parallelize([1,2,3]).count()
> 15/07/16 00:52:15 INFO SparkContext: Starting job: count at <stdin>:1
> 15/07/16 00:52:15 INFO DAGScheduler: Got job 5 (count at <stdin>:1) with 2 output
partitions (allowLocal=false)
> 15/07/16 00:52:15 INFO DAGScheduler: Final stage: ResultStage 5(count at <stdin>:1)
> 15/07/16 00:52:15 INFO DAGScheduler: Parents of final stage: List()
> 15/07/16 00:52:15 INFO DAGScheduler: Missing parents: List()
> 15/07/16 00:52:15 INFO DAGScheduler: Submitting ResultStage 5 (PythonRDD[12] at count
at <stdin>:1), which has no missing parents
> 15/07/16 00:52:15 INFO TaskSchedulerImpl: Cancelling stage 5
> 15/07/16 00:52:15 INFO DAGScheduler: ResultStage 5 (count at <stdin>:1) failed
in Unknown s
> 15/07/16 00:52:15 INFO DAGScheduler: Job 5 failed: count at <stdin>:1, took 0.004963
s
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 972, in count
>     return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 963, in sum
>     return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 771, in reduce
>     vals = self.mapPartitions(func).collect()
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/pyspark/rdd.py", line 745, in collect
>     port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
>   File "/opt/spark/spark-1.4.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization
failed: java.lang.reflect.InvocationTargetException
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
> org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
> org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:80)
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
> org.apache.spark.SparkContext.broadcast(SparkContext.scala:1289)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:874)
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419)
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:884)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:815)
> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:799)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1419)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message