hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Saket Saurabh (JIRA)" <>
Subject [jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions
Date Thu, 11 Aug 2016 10:08:20 GMT


Saket Saurabh updated HIVE-14035:
    Attachment: HIVE-14035.14.patch

Patch #14 significantly refactors the way split strategies are chosen for ACID split-update
case and now correctly sets the isOriginal flag on a per split basis. When split-update is
enabled, a split on base file can be of three types: split on an original_base, split on an
compacted_base, & split on an insert_delta. It is possible that we might end up with a
set of OrcSplits that splits both original and insert_delta in same job. In such cases, it
is very important that we set the isOriginal flag correctly, otherwise it will mess up the
way split strategies are used to instantiate a number of things. This patch takes care of
Additionally, the patch now also optimizes for the case when we had to process uncovered buckets
when the split had no base (possible previously when we had only deltas). Now when split-update
is enabled, every split will have a base, because there is no point of having a split that
is supposed to just read the delete_deltas. (Minor compaction is not a concern here because
minor compaction always creates a single split and has a separate logic of doing that, and
that has not been modified.) 
Tests for all these changes are added to TestInputOutputFormat for various scenarios. Also
addresses comments at RB. 

> Enable predicate pushdown to delta files created by ACID Transactions
> ---------------------------------------------------------------------
>                 Key: HIVE-14035
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions
>            Reporter: Saket Saurabh
>            Assignee: Saket Saurabh
>         Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, HIVE-14035.05.patch,
HIVE-14035.06.patch, HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch,
HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.patch
> In current Hive version, delta files created by ACID transactions do not allow predicate
pushdown if they contain any update/delete events. This is done to preserve correctness when
following a multi-version approach during event collapsing, where an update event overwrites
an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete event followed
by a new insert event, that can enable predicate push down to all delta files without breaking
correctness. To support backward compatibility for this feature, this JIRA also proposes to
add some sort of versioning to ACID that can allow different versions of ACID transactions
to co-exist together.

This message was sent by Atlassian JIRA

View raw message