crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-466) Occasional Spark Test failures due to Future Timeouts
Date Fri, 29 Aug 2014 21:31:53 GMT
Micah Whitacre created CRUNCH-466:
-------------------------------------

             Summary: Occasional Spark Test failures due to Future Timeouts
                 Key: CRUNCH-466
                 URL: https://issues.apache.org/jira/browse/CRUNCH-466
             Project: Crunch
          Issue Type: Bug
          Components: Core
            Reporter: Micah Whitacre
            Assignee: Josh Wills


When building master and the 0.11 RC on one devices I started getting sporadic test failures.
 The test that failed changed between runs.  The error seems to be related to Spark starting
up for testing vs anything wrong with our code.

Here is an example of one of the failures...
{quote}
14/08/29 16:16:17 INFO Remoting: Starting remoting
14/08/29 16:16:27 ERROR Remoting: Remoting error: [Startup timed out] [
akka.remote.RemoteTransportException: Startup timed out
	at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:129)
	at akka.remote.Remoting.start(Remoting.scala:191)
	at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
	at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:579)
	at akka.actor.ActorSystemImpl._start(ActorSystem.scala:577)
	at akka.actor.ActorSystemImpl.start(ActorSystem.scala:588)
	at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
	at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
	at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:104)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:152)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:67)
	at org.apache.crunch.impl.spark.SparkPipeline.runAsync(SparkPipeline.java:137)
	at org.apache.crunch.impl.spark.SparkPipeline.run(SparkPipeline.java:110)
	at org.apache.crunch.materialize.MaterializableIterable.iterator(MaterializableIterable.java:94)
	at com.google.common.collect.Lists.newArrayList(Lists.java:125)
	at org.apache.crunch.SparkAggregatorIT.testCount(SparkAggregatorIT.java:43)
{quote}

If we changed the tests to specify a SparkConf we should be able to increase the akka.actor.timeout
to be longer.  I also saw a few posts about Akka having trouble if it spins up a lot of actors.
 I haven't looked into Spark's testing framework but maybe if we could consolidate startup/shutdown
to the beginning or end of a suite it might help.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message