hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <>
Subject [jira] [Commented] (HIVE-20941) Compactor produces a delete_delta_x_y even if there are no input delete events
Date Fri, 14 Dec 2018 00:08:00 GMT


Eugene Koifman commented on HIVE-20941:

Notes for myself:

{{AcidUtils.getAcidState()}} has
else if (prev != null && next.maxWriteId == prev.maxWriteId
                  && next.minWriteId == prev.minWriteId
                  && next.statementId == prev.statementId) {
        // The 'next' parsedDelta may have everything equal to the 'prev' parsedDelta, except
        // the path. This may happen when we have split update and we have two types of delta
        // directories- 'delta_x_y' and 'delete_delta_x_y' for the SAME txn range.

        // Also note that any delete_deltas in between a given delta_x_y range would be made
        // obsolete. For example, a delta_30_50 would make delete_delta_40_40 obsolete.
        // This is valid because minor compaction always compacts the normal deltas and the
        // deltas for the same range. That is, if we had 3 directories, delta_30_30,
        // delete_delta_40_40 and delta_50_50, then running minor compaction would produce
        // delta_30_50 and delete_delta_30_50.

        prev = next;
{{AcidUtils.ParsedDelta.compareTo()}} sorts delta_x_y after delete_delta_x_y

{{}} calls getAcidState() and looks at all the deltas (insert + delete) to
find min/max for delta_min_max that it will produce.

{{}} feeds all delta dir Paths to {{OrcRawRecordMerger}} which does a multiway
merge to output a single stream of events that can be either Insert or Delete. {{map()}} then
splits the stream into 2 according to this type.

So the invariant remains the same, for any given x, y we can {{delta_x_y}} or ({{delta_x_y}}
and {{delete_delta_x_y}}) or {{delete_delta_x_y}} just like before this change.

I tweaked the text of a comment, so attaching patch 6 for completeness.

> Compactor produces a delete_delta_x_y even if there are no input delete events
> ------------------------------------------------------------------------------
>                 Key: HIVE-20941
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Assignee: Igor Kryvenko
>            Priority: Major
>         Attachments: HIVE-20941.01.patch, HIVE-20941.02.patch, HIVE-20941.03.patch, HIVE-20941.04.patch,
HIVE-20941.05.patch, HIVE-20941.06.patch
> see example in HIVE-20901
> Probably change logic in which creates delete event writer

This message was sent by Atlassian JIRA

View raw message