spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (JIRA)" <>
Subject [jira] [Assigned] (SPARK-25635) Support selective direct encoding in native ORC write
Date Wed, 03 Oct 2018 21:13:00 GMT


Dongjoon Hyun reassigned SPARK-25635:

    Assignee: Dongjoon Hyun

> Support selective direct encoding in native ORC write
> -----------------------------------------------------
>                 Key: SPARK-25635
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
> Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold`
is applied for all columns. This is a big huddle to enable dictionary encoding.
> From ORC 1.5.3, `` is added to enforce direct encoding selectively
in a column-wise manner. This issue aims to add that feature by upgrading ORC from 1.5.2 to
> The followings are the patches in ORC 1.5.3 and this feature is the only one related
to Spark directly.
> {code}
> ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte
data (gopalv)
> ORC-403: [C++] Add checks to avoid invalid offsets in InputStream
> ORC-405. Remove calcite as a dependency from the benchmarks.
> ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places.
> ORC-383: Parallel builds fails with ConcurrentModificationException
> ORC-382: Apache rat exclusions + add rat check to travis
> ORC-401: Fix incorrect quoting in specification.
> ORC-385. Change RecordReader to extend Closeable.
> ORC-384: [C++] fix memory leak when loading non-ORC files
> ORC-391: [c++] parseType does not accept underscore in the field name
> ORC-397. Allow selective disabling of dictionary encoding. Original patch was by Mithun
> ORC-389: Add ability to not decode Acid metadata columns
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message