spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandeep Singh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-928) Add support for Unsafe-based serializer in Kryo 2.22
Date Mon, 09 May 2016 07:03:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265805#comment-15265805
] 

Sandeep Singh edited comment on SPARK-928 at 5/9/16 7:02 AM:
-------------------------------------------------------------

[~joshrosen] I would like to work on this.

I tried benchmarking the difference between unsafe kryo and our current impl. and then we
can have a spark.kryo.useUnsafe flag as Matei has mentioned.

{code:title=Benchmarking results|borderStyle=solid}
JBenchmark Kryo Unsafe vs safe Serialization: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
  Relative
      ------------------------------------------------------------------------------------------------
      basicTypes: Int unsafe:true                    160 /  178         98.5          10.1
      1.0X
      basicTypes: Long unsafe:true                   210 /  218         74.9          13.4
      0.8X
      basicTypes: Float unsafe:true                  203 /  213         77.5          12.9
      0.8X
      basicTypes: Double unsafe:true                 226 /  235         69.5          14.4
      0.7X
      Array: Int unsafe:true                        1087 / 1101         14.5          69.1
      0.1X
      Array: Long unsafe:true                       2758 / 2844          5.7         175.4
      0.1X
      Array: Float unsafe:true                      1511 / 1552         10.4          96.1
      0.1X
      Array: Double unsafe:true                     2942 / 2972          5.3         187.0
      0.1X
      Map of string->Double unsafe:true             2645 / 2739          5.9         168.2
      0.1X
      basicTypes: Int unsafe:false                   211 /  218         74.7          13.4
      0.8X
      basicTypes: Long unsafe:false                  247 /  253         63.6          15.7
      0.6X
      basicTypes: Float unsafe:false                 211 /  216         74.5          13.4
      0.8X
      basicTypes: Double unsafe:false                227 /  233         69.2          14.4
      0.7X
      Array: Int unsafe:false                       3012 / 3032          5.2         191.5
      0.1X
      Array: Long unsafe:false                      4463 / 4515          3.5         283.8
      0.0X
      Array: Float unsafe:false                     2788 / 2868          5.6         177.2
      0.1X
      Array: Double unsafe:false                    3558 / 3752          4.4         226.2
      0.0X
      Map of string->Double unsafe:false            2806 / 2933          5.6         178.4
      0.1X
{code}

You can find the code for benchmarking here (https://github.com/techaddict/spark/commit/46fa44141c849ca15bbd6136cea2fa52bd927da2),
very ugly right now but will improve it(add more benchmarks) before creating a PR.



was (Author: techaddict):
[~joshrosen] I would like to work on this.

I tried benchmarking the difference between unsafe kryo and our current impl. and then we
can have a spark.kryo.useUnsafe flag as Matei has mentioned.

{code:title=Benchmarking results|borderStyle=solid}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11.4
      Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz

      Benchmark Kryo Unsafe vs safe Serialization: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
  Relative
      ------------------------------------------------------------------------------------------------
      basicTypes: Int unsafe:false                     2 /    4       8988.0           0.1
      1.0X
      basicTypes: Long unsafe:false                    1 /    1      13981.3           0.1
      1.6X
      basicTypes: Float unsafe:false                   1 /    1      14460.6           0.1
      1.6X
      basicTypes: Double unsafe:false                  1 /    1      15876.9           0.1
      1.8X
      Array: Int unsafe:false                         33 /   44        474.8           2.1
      0.1X
      Array: Long unsafe:false                        18 /   25        888.6           1.1
      0.1X
      Array: Float unsafe:false                       10 /   16       1627.4           0.6
      0.2X
      Array: Double unsafe:false                      10 /   13       1523.1           0.7
      0.2X
      Map of string->Double unsafe:false             413 /  447         38.1          26.3
      0.0X
      basicTypes: Int unsafe:true                      1 /    1      16402.6           0.1
      1.8X
      basicTypes: Long unsafe:true                     1 /    1      19732.1           0.1
      2.2X
      basicTypes: Float unsafe:true                    1 /    1      19752.9           0.1
      2.2X
      basicTypes: Double unsafe:true                   1 /    1      23111.4           0.0
      2.6X
      Array: Int unsafe:true                           7 /    8       2239.9           0.4
      0.2X
      Array: Long unsafe:true                          8 /    9       2000.1           0.5
      0.2X
      Array: Float unsafe:true                         7 /    8       2191.5           0.5
      0.2X
      Array: Double unsafe:true                        9 /   10       1841.2           0.5
      0.2X
      Map of string->Double unsafe:true              387 /  407         40.7          24.6
      0.0X
{code}

You can find the code for benchmarking here (https://github.com/techaddict/spark/commit/46fa44141c849ca15bbd6136cea2fa52bd927da2),
very ugly right now but will improve it(add more benchmarks) before creating a PR.


> Add support for Unsafe-based serializer in Kryo 2.22
> ----------------------------------------------------
>
>                 Key: SPARK-928
>                 URL: https://issues.apache.org/jira/browse/SPARK-928
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
>              Labels: starter
>
> This can reportedly be quite a bit faster, but it also requires Chill to update its Kryo
dependency. Once that happens we should add a spark.kryo.useUnsafe flag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message