spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jerryye <jerr...@gmail.com>
Subject saveAsTextFile makes no progress without caching RDD
Date Fri, 22 Aug 2014 00:13:50 GMT
Hi, 
Cross-posting this from users list.

I'm running on branch-1.1 and trying to do a simple transformation to a
relatively small dataset of 64GB and saveAsTextFile essentially hangs and
tasks are stuck in running mode with the following code: 

// Stalls with tasks running for over an hour with no tasks finishing.
Smallest partition is 10MB 
val data = sc.textFile("s3n://input") 
val reformatted = data.map(t =>
t.replace("Test(","").replace(")","").replaceAll(",","\t")) 
reformatted.saveAsTextFile("s3n://transformed") 

// This runs but stalls doing GC after filling up 150% of 650GB of memory 
val data = sc.textFile("s3n://input") 
val reformatted = data.map(t =>
t.replace("Test(","").replace(")","").replaceAll(",","\t")).cache 
reformatted.saveAsTextFile("s3n://transformed") 

Any idea if this is a parameter issue and there is something I should try
out? 

Thanks! 

- jerry 



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-makes-no-progress-without-caching-RDD-tp7949.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message