Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
Received-SPF: softfail (athena.apache.org: transitioning domain of
 mickdelaney@gmail.com does not designate 162.253.133.43 as permitted sender)
Date: Sat, 14 Feb 2015 09:10:53 -0700 (MST)
From: mickdelaney <mickdelaney@gmail.com>
To: user@spark.apache.org
Message-ID: <1423930253403-21655.post@n3.nabble.com>
In-Reply-To: <1423872833563-21651.post@n3.nabble.com>
References: <1422113658538-21347.post@n3.nabble.com>
 <1423872833563-21651.post@n3.nabble.com>
Subject: Re: SparkException: Task not serializable - Jackson Json
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

to get past this you can move the mapper creation code down into the closure.
its then created on the worker node so it doesnt need to be serialized. 


// Parse it into a specific case class. We use flatMap to handle errors 
// by returning an empty list (None) if we encounter an issue and a 
// list with one element if everything is ok (Some(_)). 
val result = input.flatMap(record => { 
  try { 
 val mapper = new ObjectMapper with ScalaObjectMapper 
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) 
mapper.registerModule(DefaultScalaModule) 
    Some(mapper.readValue(record, classOf[Company])) 
  } catch { 
    case e: Exception => None 
  } 
}) 

result.map(mapper.writeValueAsString(_)).saveAsTextFile(outputFile) 
} 
}


BUT for more efficiency look into creating the mapper in a *mapPartitions* 
iterator, which means it'll be created on the worker node but only per
partition and not for every row like above.


--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkException-Task-not-serializable-Jackson-Json-tp21347p21655.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org