spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-13997) Use Hadoop 2.0 default value for compression in data sources
Date Fri, 18 Mar 2016 04:17:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-13997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200984#comment-15200984
] 

Apache Spark commented on SPARK-13997:
--------------------------------------

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/11806

> Use Hadoop 2.0 default value for compression in data sources
> ------------------------------------------------------------
>
>                 Key: SPARK-13997
>                 URL: https://issues.apache.org/jira/browse/SPARK-13997
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>            Priority: Trivial
>
> Currently, JSON, TEXT and CSV data sources use {{CompressionCodecs}} class to set compression
configurations via {{option("compress", "codec")}}.
> I made this uses Hadoop 1.x default value (block level compression). However, the default
value in Hadoop 2.x is record level compression as described in [mapred-site.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml].
> Since it drops Hadoop 1.x, it will make sense to use Hadoop 2.x default values.
> According to [Hadoop Definitive Guide 3th edition|https://www.safaribooksonline.com/library/view/hadoop-the-definitive/9781449328917/ch04.html],
it looks configurations for the unit of compression (record or block).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message