Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3A1D911065 for ; Fri, 22 Aug 2014 00:14:17 +0000 (UTC) Received: (qmail 78005 invoked by uid 500); 22 Aug 2014 00:14:16 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 77946 invoked by uid 500); 22 Aug 2014 00:14:16 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 77934 invoked by uid 99); 22 Aug 2014 00:14:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 00:14:15 +0000 X-ASF-Spam-Status: No, hits=2.3 required=10.0 tests=SPF_SOFTFAIL,URI_HEX,WEIRD_QUOTING X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of jerryye@gmail.com does not designate 216.139.236.26 as permitted sender) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Aug 2014 00:14:11 +0000 Received: from ben.nabble.com ([192.168.236.152]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1XKcUI-0008IE-SV for dev@spark.incubator.apache.org; Thu, 21 Aug 2014 17:13:50 -0700 Date: Thu, 21 Aug 2014 17:13:50 -0700 (PDT) From: jerryye To: dev@spark.incubator.apache.org Message-ID: <1408666430871-7949.post@n3.nabble.com> Subject: saveAsTextFile makes no progress without caching RDD MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, Cross-posting this from users list. I'm running on branch-1.1 and trying to do a simple transformation to a relatively small dataset of 64GB and saveAsTextFile essentially hangs and tasks are stuck in running mode with the following code: // Stalls with tasks running for over an hour with no tasks finishing. Smallest partition is 10MB val data = sc.textFile("s3n://input") val reformatted = data.map(t => t.replace("Test(","").replace(")","").replaceAll(",","\t")) reformatted.saveAsTextFile("s3n://transformed") // This runs but stalls doing GC after filling up 150% of 650GB of memory val data = sc.textFile("s3n://input") val reformatted = data.map(t => t.replace("Test(","").replace(")","").replaceAll(",","\t")).cache reformatted.saveAsTextFile("s3n://transformed") Any idea if this is a parameter issue and there is something I should try out? Thanks! - jerry -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/saveAsTextFile-makes-no-progress-without-caching-RDD-tp7949.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org For additional commands, e-mail: dev-help@spark.apache.org