hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20941) Compactor produces a delete_delta_x_y even if there are no input delete events
Date Fri, 14 Dec 2018 00:08:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720714#comment-16720714
] 

Eugene Koifman commented on HIVE-20941:
---------------------------------------

Notes for myself:

{{AcidUtils.getAcidState()}} has
{code:java}
else if (prev != null && next.maxWriteId == prev.maxWriteId
                  && next.minWriteId == prev.minWriteId
                  && next.statementId == prev.statementId) {
        // The 'next' parsedDelta may have everything equal to the 'prev' parsedDelta, except
        // the path. This may happen when we have split update and we have two types of delta
        // directories- 'delta_x_y' and 'delete_delta_x_y' for the SAME txn range.

        // Also note that any delete_deltas in between a given delta_x_y range would be made
        // obsolete. For example, a delta_30_50 would make delete_delta_40_40 obsolete.
        // This is valid because minor compaction always compacts the normal deltas and the
delete
        // deltas for the same range. That is, if we had 3 directories, delta_30_30,
        // delete_delta_40_40 and delta_50_50, then running minor compaction would produce
        // delta_30_50 and delete_delta_30_50.

        deltas.add(next);
        prev = next;
      }
{code}
{{AcidUtils.ParsedDelta.compareTo()}} sorts delta_x_y after delete_delta_x_y

{{CompactorMR.run()}} calls getAcidState() and looks at all the deltas (insert + delete) to
find min/max for delta_min_max that it will produce.

{{CompactorMap.map()}} feeds all delta dir Paths to {{OrcRawRecordMerger}} which does a multiway
merge to output a single stream of events that can be either Insert or Delete. {{map()}} then
splits the stream into 2 according to this type.

So the invariant remains the same, for any given x, y we can {{delta_x_y}} or ({{delta_x_y}}
and {{delete_delta_x_y}}) or {{delete_delta_x_y}} just like before this change.

I tweaked the text of a comment, so attaching patch 6 for completeness.

> Compactor produces a delete_delta_x_y even if there are no input delete events
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-20941
>                 URL: https://issues.apache.org/jira/browse/HIVE-20941
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.0.0
>            Reporter: Eugene Koifman
>            Assignee: Igor Kryvenko
>            Priority: Major
>         Attachments: HIVE-20941.01.patch, HIVE-20941.02.patch, HIVE-20941.03.patch, HIVE-20941.04.patch,
HIVE-20941.05.patch, HIVE-20941.06.patch
>
>
> see example in HIVE-20901
>  
> Probably change logic in CompactorMR.CompactorMap.map() which creates delete event writer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message