spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-25635) Support selective direct encoding in native ORC write
Date Wed, 03 Oct 2018 21:13:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun reassigned SPARK-25635:
-------------------------------------

    Assignee: Dongjoon Hyun

> Support selective direct encoding in native ORC write
> -----------------------------------------------------
>
>                 Key: SPARK-25635
>                 URL: https://issues.apache.org/jira/browse/SPARK-25635
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
>
> Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold`
is applied for all columns. This is a big huddle to enable dictionary encoding.
> From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct encoding selectively
in a column-wise manner. This issue aims to add that feature by upgrading ORC from 1.5.2 to
1.5.3.
> The followings are the patches in ORC 1.5.3 and this feature is the only one related
to Spark directly.
> {code}
> ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte
data (gopalv)
> ORC-403: [C++] Add checks to avoid invalid offsets in InputStream
> ORC-405. Remove calcite as a dependency from the benchmarks.
> ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places.
> ORC-383: Parallel builds fails with ConcurrentModificationException
> ORC-382: Apache rat exclusions + add rat check to travis
> ORC-401: Fix incorrect quoting in specification.
> ORC-385. Change RecordReader to extend Closeable.
> ORC-384: [C++] fix memory leak when loading non-ORC files
> ORC-391: [c++] parseType does not accept underscore in the field name
> ORC-397. Allow selective disabling of dictionary encoding. Original patch was by Mithun
Radhakrishnan.
> ORC-389: Add ability to not decode Acid metadata columns
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message