hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teddy Choi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables
Date Sun, 02 Jul 2017 12:28:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Teddy Choi updated HIVE-12631:
------------------------------
    Attachment: HIVE-12631.11.patch

This 11th patch has two major changes. The first one is the new ORC ACID row batch encoded
data consumer. It adds the vectorized ORC ACID row batch reader in LLAP, which is very performant
for LLAP ACID. The second one is the reader generalization in the ORC raw record merger. The
ACID logic now can work with more readers, rather than ORC reader only.

This patch enables following works in other issues;
# Introducing the LLAP record reader in the ORC raw record merger to minimize non-LLAP reads
# Replacing BitSet objects with integer arrays for more performance
# Adding the vectorized ORC ACID row reader in LLAP.

> LLAP: support ORC ACID tables
> -----------------------------
>
>                 Key: HIVE-12631
>                 URL: https://issues.apache.org/jira/browse/HIVE-12631
>             Project: Hive
>          Issue Type: Bug
>          Components: llap, Transactions
>            Reporter: Sergey Shelukhin
>            Assignee: Teddy Choi
>         Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, HIVE-12631.11.patch, HIVE-12631.1.patch,
HIVE-12631.2.patch, HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, HIVE-12631.6.patch,
HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and parallelization
of reads and processing. This path does not support ACID. As far as I remember ACID logic
is embedded inside ORC format; we need to refactor it to be on top of some interface, if practical;
or just port it to LLAP read path.
> Another consideration is how the logic will work with cache. The cache is currently low-level
(CB-level in ORC), so we could just use it to read bases and deltas (deltas should be cached
with higher priority) and merge as usual. We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message