spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From suman bharadwaj <suman....@gmail.com>
Subject Re: KRYO usage details: Need Help
Date Wed, 22 Jan 2014 23:13:16 GMT
Brilliant. Thanks !!

Regards,
SB


On Thu, Jan 23, 2014 at 3:42 AM, Matei Zaharia <matei.zaharia@gmail.com>wrote:

> You should register each class you plan to use within your RDDs. In your
> case you only have an RDD of Strings, so you don’t even really need a
> registrator (strings are registered by default). But if you made custom
> objects you would use one.
>
> To speed up Kryo you can also add kryo.setReferences(false) or set spark.kryo.referenceTracking
> = false. This disables tracking of circular references. But in general
> benchmarking on this small amount of data, you’ll probably have noise from
> the JVM starting up.
>
> Matei
>
> On Jan 22, 2014, at 12:27 PM, suman bharadwaj <suman.dna@gmail.com> wrote:
>
> Hi,
>
> I'm using the below SPARK Code. Currently i have a file of size 25 MB. And
> I'm trying to do a comparative study on Kryo and Java serialization.
>
> I had couple of questions:
>
> 1. How do you know which classes to register in Kryo ? [ highlighted in
> yellow ]
> 2. When data is small, I'm seeing Java Serialization has better
> performance than Kryo. so was wondering whether the below code represents
>  the correct usage of Kryo ?
>
> *import org.apache.spark._*
> *import com.esotericsoftware.kryo.Kryo*
> *import org.apache.spark.serializer.KryoRegistrator*
> *import org.apache.hadoop.io.LongWritable*
> *import org.apache.hadoop.io.Text*
> *import org.apache.spark.storage.StorageLevel*
>
> *class MyRegistrator extends KryoRegistrator {*
> *  override def registerClasses(kryo: Kryo) {*
> *        kryo.register(classOf[LongWritable])*
> *        kryo.register(classOf[Text])*
> *        kryo.register(classOf[Integer])*
> *        kryo.register(classOf[Array[String]])*
> *  }*
> *}*
>
> *object HTest {*
>
>  *  def main(args: Array[String]) {*
> *        System.setProperty("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")*
> *        System.setProperty("spark.kryo.registrator", "MyRegistrator")*
> *        val sc = new SparkContext("local[4]","Test")*
> *        val input =
> sc.textFile("/home/Test/DataSet/cd7a58dc-2053-4811-8463-b144781352ac_000004.csv").persist(StorageLevel.MEMORY_ONLY_SER)*
> *        println(input.count())*
> *        Thread.sleep(30000L)*
> *        println(input.count())*
> *        Thread.sleep(30000L)*
> *  }*
> *}*
>
> Your Help is Highly appreciated.
>
> Regards,
> SB
>
>
>

Mime
View raw message