spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Min Li (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-1967) Using parallelize method to create RDD, wordcount app just hanging there without errors or warnings
Date Thu, 29 May 2014 20:23:02 GMT
Min Li created SPARK-1967:
-----------------------------

             Summary: Using parallelize method to create RDD, wordcount app just hanging there
without errors or warnings
                 Key: SPARK-1967
                 URL: https://issues.apache.org/jira/browse/SPARK-1967
             Project: Spark
          Issue Type: Bug
    Affects Versions: 0.9.1
         Environment: Ubuntu-12.04, single machine spark standalone, 8 core, 8G mem, spark
0.9.1, java-1.7
            Reporter: Min Li


I was trying the parallelize method to create RDD. I used Java. And it's a simple wordcount
program, except that I first read the input into memory and then use the parallelize method
to create the RDD, rather than the default textFile method in the given example. 
Pseudo codes:
JavaSparkContext ctx = new JavaSparkContext($SparkMasterURL, $NAME, $SparkHome, $jars);
List<String> input = #read lines from input file and form a ArrayList<String>
JavaRDD lines = ctx.parallelize(input);
//followed by wordcount
----above is not working.
JavaRDD lines = ctx.textFile(file);
//followed by wordcount
----this is working

The log is:
14/05/29 16:18:43 INFO Slf4jLogger: Slf4jLogger started
14/05/29 16:18:43 INFO Remoting: Starting remoting
14/05/29 16:18:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@spark:55224]
14/05/29 16:18:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@spark:55224]
14/05/29 16:18:43 INFO SparkEnv: Registering BlockManagerMaster
14/05/29 16:18:43 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140529161843-836a
14/05/29 16:18:43 INFO MemoryStore: MemoryStore started with capacity 1056.0 MB.
14/05/29 16:18:43 INFO ConnectionManager: Bound socket to port 42942 with id = ConnectionManagerId(spark,42942)
14/05/29 16:18:43 INFO BlockManagerMaster: Trying to register BlockManager
14/05/29 16:18:43 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager
spark:42942 with 1056.0 MB RAM
14/05/29 16:18:43 INFO BlockManagerMaster: Registered BlockManager
14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
14/05/29 16:18:43 INFO HttpBroadcast: Broadcast server started at http://10.227.119.185:43522
14/05/29 16:18:43 INFO SparkEnv: Registering MapOutputTracker
14/05/29 16:18:43 INFO HttpFileServer: HTTP File server directory is /tmp/spark-3704a621-789c-4d97-b1fc-9654236dba3e
14/05/29 16:18:43 INFO HttpServer: Starting HTTP Server
14/05/29 16:18:43 INFO SparkUI: Started Spark Web UI at http://spark:4040
14/05/29 16:18:44 INFO SparkContext: Added JAR /home/maxmin/tmp/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar
at http://10.227.119.185:55286/jars/spark-test-1.0-SNAPSHOT-jar-with-dependencies.jar with
timestamp 1401394724045
14/05/29 16:18:44 INFO AppClient$ClientActor: Connecting to master spark://spark:7077...
14/05/29 16:18:44 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID
app-20140529161844-0001
14/05/29 16:18:44 INFO AppClient$ClientActor: Executor added: app-20140529161844-0001/0 on
worker-20140529155406-spark-59658 (spark:59658) with 8 cores

The app is hanging here forever. And spark:8080 spark:4040 are not showing any strange info.
The Spark Stages page shows the Active Stages is reduceByKey, tasks: Succeeded/Total is 0/2.
I've also tried directly call lines.count after parallelize, and the app will stuck at the
count stage.

I used spark-0.9.1 and used default spark-env.sh. In the slaves file I have only one host.
I used maven to compile a fat jar with spark specified as provided. I modified the run-example
script to submit the jar.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message