Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F2BBD17635 for ; Sat, 14 Feb 2015 16:11:24 +0000 (UTC) Received: (qmail 91609 invoked by uid 500); 14 Feb 2015 16:11:22 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 91530 invoked by uid 500); 14 Feb 2015 16:11:22 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 91491 invoked by uid 99); 14 Feb 2015 16:11:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Feb 2015 16:11:21 +0000 X-ASF-Spam-Status: No, hits=2.3 required=5.0 tests=SPF_SOFTFAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of mickdelaney@gmail.com does not designate 162.253.133.43 as permitted sender) Received: from [162.253.133.43] (HELO mwork.nabble.com) (162.253.133.43) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Feb 2015 16:11:16 +0000 Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id F206813E014E for ; Sat, 14 Feb 2015 08:10:53 -0800 (PST) Date: Sat, 14 Feb 2015 09:10:53 -0700 (MST) From: mickdelaney To: user@spark.apache.org Message-ID: <1423930253403-21655.post@n3.nabble.com> In-Reply-To: <1423872833563-21651.post@n3.nabble.com> References: <1422113658538-21347.post@n3.nabble.com> <1423872833563-21651.post@n3.nabble.com> Subject: Re: SparkException: Task not serializable - Jackson Json MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org to get past this you can move the mapper creation code down into the closure. its then created on the worker node so it doesnt need to be serialized. // Parse it into a specific case class. We use flatMap to handle errors // by returning an empty list (None) if we encounter an issue and a // list with one element if everything is ok (Some(_)). val result = input.flatMap(record => { try { val mapper = new ObjectMapper with ScalaObjectMapper mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) mapper.registerModule(DefaultScalaModule) Some(mapper.readValue(record, classOf[Company])) } catch { case e: Exception => None } }) result.map(mapper.writeValueAsString(_)).saveAsTextFile(outputFile) } } BUT for more efficiency look into creating the mapper in a *mapPartitions* iterator, which means it'll be created on the worker node but only per partition and not for every row like above. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkException-Task-not-serializable-Jackson-Json-tp21347p21655.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@spark.apache.org For additional commands, e-mail: user-help@spark.apache.org