hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8151) Dynamic partition sort optimization inserts record wrongly to partition when used with GroupBy
Date Tue, 16 Sep 2014 22:37:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136396#comment-14136396
] 

Prasanth J commented on HIVE-8151:
----------------------------------

The vectorization test case in dynpart_sort_optimization2.q must be revisited as HIVE-7557
disabled VectorFileSinkOperator. The proper fix for HIVE-7557 should make sure that VectorFS
should inherit mostly from FS operator. Current code in trunk is stale as VectorFS contains
old code from FS which went through many changes recently. cc/ [~mmccline]

> Dynamic partition sort optimization inserts record wrongly to partition when used with
GroupBy
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8151
>                 URL: https://issues.apache.org/jira/browse/HIVE-8151
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Prasanth J
>            Assignee: Prasanth J
>            Priority: Critical
>         Attachments: HIVE-8151.1.patch
>
>
> HIVE-6455 added dynamic partition sort optimization. It added startGroup() method to
FileSink operator to look for changes in reduce key for creating partition directories. This
method however is reliable as the key called with startGroup() is different from the key called
with processOp(). startGroup() is called with newly changed key whereas processOp() is called
with previously aggregated key. This will result in processOp() writing the last row of previous
group as the first row of next group. This happens only when used with group by operator.
> The fix is to not rely on startGroup() and do the partition directory creation in processOp()
itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message