hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-15844) Make ReduceSinkOperator independent of Acid
Date Wed, 01 Mar 2017 21:42:45 GMT


Ashutosh Chauhan commented on HIVE-15844:

Thanks for taking this up [~ekoifman] Such refactoring is very much needed to keep sanity
of devs and readability of code.

I am still troubled with SPDO inserting "constant" bucket_number column and the RS magically
computing and replacing that constant at runtime. Ideally that should be created as column
expression which is evaluated as any other expression in RS (or perhaps in SEL prior to it).
I am hopeful someday that refactoring would happen as well :)

> Make ReduceSinkOperator independent of Acid
> -------------------------------------------
>                 Key: HIVE-15844
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>             Fix For: 2.2.0
>         Attachments: HIVE-15844.01.patch, HIVE-15844.02.patch, HIVE-15844.03.patch, HIVE-15844.04.patch,
HIVE-15844.05.patch, HIVE-15844.06.patch, HIVE-15844.07.patch, HIVE-15844.08.patch
> # both FileSinkDesk and ReduceSinkDesk have special code path for Update/Delete operations.
It is not always set correctly for ReduceSink. ReduceSinkDeDuplication is one place where
it gets lost. Even when it isn't set correctly, elsewhere we set ROW_ID to be the partition
column of the ReduceSinkOperator and UDFToInteger special cases it to extract bucketId from
ROW_ID. We need to modify Explain Plan to record Write Type (i.e. insert/update/delete) to
make sure we have tests that can catch errors here.
> # Add some validation at the end of the plan to make sure that RSO/FSO which represent
the end of the pipeline and write to acid table have WriteType set (to something other than
> #  We don't seem to have any tests where number of buckets is > number of reducers.
Add those.

This message was sent by Atlassian JIRA

View raw message