spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lev <>
Subject Prevent spark from serializing some objects
Date Tue, 15 Sep 2015 12:30:30 GMT

As I understand it, using the method /bar/ will result in serializing the
/Foo/ instance to the cluster:

/class Foo() {
    val x = 5

    def bar(rdd: RDD[Int]): RDD[Int] = {*x)

and since the /Foo/ instance might be very big, it might cause performance

I know how to solve this case (create a local copy of /x/ inside /bar/, and
use it).

I would like to know,
is there a way to prevent /Foo/ from ever being serialized and sent to the

I can't force /Foo/ to be not serializable, since it need to be serialized
at some other stage (not sent to spark, just saved to disk)

One idea that I tried was to create a trait like:

/trait SparkNonSrializable

class Foo extends SparkNonSrializable {...}/

and use a custom serializer in spark (by setting the "spark.serializer"
that will fail for objects that extends /SparkNonSrializable/
but I wasn't able to make it work
(the serializer is getting used, but the condition
/t.isInstanceOf[SparkNonSerializable]/ is never true)

here is my code:
trait SparkNonSerializable

class MySerializer(conf: SparkConf) extends JavaSerializer(conf) {
  override def newInstance(): SerializerInstance = {
    val inst = super.newInstance()

    new SerializerInstance {
      override def serializeStream(s: OutputStream): SerializationStream =
      override def serialize[T](t: T)(implicit evidence$1: ClassTag[T]):
ByteBuffer =
        if (t.isInstanceOf[SparkNonSerializable])

      override def deserializeStream(s: InputStream): DeserializationStream
= inst.deserializeStream(s)

      override def deserialize[T](bytes: ByteBuffer)(implicit evidence$2:
ClassTag[T]): T = {
        val t = inst.deserialize(bytes)
        if (t.isInstanceOf[SparkNonSerializable])

      override def deserialize[T](bytes: ByteBuffer, loader:
ClassLoader)(implicit evidence$3: ClassTag[T]): T = {
        val t = inst.deserialize(bytes, loader)
        if (t.isInstanceOf[SparkNonSerializable])

My questions are:
1. Do you see why my custom serializer can't catch objects with that trait

2. Any other ideas of how to prevent /Foo/ from being serialized?
My solution might be OK for tests,
but I'm a bit reluctant to use my own serializer on production code


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message