hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zoltan Haindrich (Jira)" <>
Subject [jira] [Commented] (HIVE-21304) Show Bucketing version for ReduceSinkOp in explain extended plan
Date Tue, 03 Mar 2020 08:54:00 GMT


Zoltan Haindrich commented on HIVE-21304:

I've looked into why this strange thing happens; apparently the the bucket2 can end up non-deterministically
with using bucketingVersion 1 or 2

After debugging the issue I've found what's causing the strange behaviour(and causes problems)
The goal of [this method|]
tries to set to appropriate bucketingVersion for the RS operator.

In case an RS has 2 FileSink (grand*)childs - with different bucketingVersion-s - it may end
up using any of them non-deterministically.

In the testcase of "bucket2"  the 2 FileSink-s are:
* table result save into a "version2" table
* columns-stats gathering 

note: column-stats aggregate is put in here becase the test has 1 reducers; so it can execute
the full aggregate as well...

> Show Bucketing version for ReduceSinkOp in explain extended plan
> ----------------------------------------------------------------
>                 Key: HIVE-21304
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Deepak Jaiswal
>            Assignee: Zoltan Haindrich
>            Priority: Major
>         Attachments: HIVE-21304.01.patch, HIVE-21304.02.patch, HIVE-21304.03.patch, HIVE-21304.04.patch,
HIVE-21304.05.patch, HIVE-21304.06.patch, HIVE-21304.07.patch, HIVE-21304.08.patch, HIVE-21304.09.patch,
HIVE-21304.10.patch, HIVE-21304.11.patch, HIVE-21304.12.patch, HIVE-21304.13.patch, HIVE-21304.14.patch,
HIVE-21304.15.patch, HIVE-21304.16.patch, HIVE-21304.17.patch, HIVE-21304.18.patch
> Show Bucketing version for ReduceSinkOp in explain extended plan.
> This helps identify what hashing algorithm is being used by by ReduceSinkOp.
> cc [~vgarg]

This message was sent by Atlassian Jira

View raw message