spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
Date Thu, 05 Mar 2015 17:12:38 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-1867.
------------------------------
    Resolution: Not a Problem

OK, I think there are two problems with the same manifestation here, but I think they're not
ultimately Spark issues per se.

1) The CDH 5.2 packaging issue, it seems, which is of course not a Spark thing
2) An issue produced when using a home-built Spark, that I can't reproduce myself, and that
does not seem to happen when using the normal release.

I'd like to re-resolve then if that's the extent of what we know.

> Spark Documentation Error causes java.lang.IllegalStateException: unread block data
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-1867
>                 URL: https://issues.apache.org/jira/browse/SPARK-1867
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: sam
>
> I've employed two System Administrators on a contract basis (for quite a bit of money),
and both contractors have independently hit the following exception.  What we are doing is:
> 1. Installing Spark 0.9.1 according to the documentation on the website, along with CDH4
(and another cluster with CDH5) distros of hadoop/hdfs.
> 2. Building a fat jar with a Spark app with sbt then trying to run it on the cluster
> I've also included code snippets, and sbt deps at the bottom.
> When I've Googled this, there seems to be two somewhat vague responses:
> a) Mismatching spark versions on nodes/user code
> b) Need to add more jars to the SparkConf
> Now I know that (b) is not the problem having successfully run the same code on other
clusters while only including one jar (it's a fat jar).
> But I have no idea how to check for (a) - it appears Spark doesn't have any version checks
or anything - it would be nice if it checked versions and threw a "mismatching version exception:
you have user code using version X and node Y has version Z".
> I would be very grateful for advice on this.
> The exception:
> Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task 0.0:1 failed
32 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread
block data)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
> 	at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> 	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> 	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to java.lang.IllegalStateException:
unread block data [duplicate 59]
> My code snippet:
> val conf = new SparkConf()
>                .setMaster(clusterMaster)
>                .setAppName(appName)
>                .setSparkHome(sparkHome)
>                .setJars(SparkContext.jarOfClass(this.getClass))
> println("count = " + new SparkContext(conf).textFile(someHdfsPath).count())
> My SBT dependencies:
> // relevant
> "org.apache.spark" % "spark-core_2.10" % "0.9.1",
> "org.apache.hadoop" % "hadoop-client" % "2.3.0-mr1-cdh5.0.0",
> // standard, probably unrelated
> "com.github.seratch" %% "awscala" % "[0.2,)",
> "org.scalacheck" %% "scalacheck" % "1.10.1" % "test",
> "org.specs2" %% "specs2" % "1.14" % "test",
> "org.scala-lang" % "scala-reflect" % "2.10.3",
> "org.scalaz" %% "scalaz-core" % "7.0.5",
> "net.minidev" % "json-smart" % "1.2"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message