spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shixiong Zhu (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-19525) Enable Compression of RDD Checkpoints
Date Fri, 28 Apr 2017 22:31:04 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shixiong Zhu resolved SPARK-19525.
----------------------------------
    Resolution: Fixed
      Assignee: Aaditya Ramesh

> Enable Compression of RDD Checkpoints
> -------------------------------------
>
>                 Key: SPARK-19525
>                 URL: https://issues.apache.org/jira/browse/SPARK-19525
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: Aaditya Ramesh
>            Assignee: Aaditya Ramesh
>             Fix For: 2.2.0
>
>
> In our testing, compressing partitions while writing them to checkpoints on HDFS using
snappy helped performance significantly while also reducing the variability of the checkpointing
operation. In our tests, checkpointing time was reduced by 3X, and variability was reduced
by 2X for data sets of compressed size approximately 1 GB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message