hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15573) Vectorization: ACID shuffle ReduceSink is not specialized
Date Wed, 25 Jan 2017 02:46:26 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15837074#comment-15837074
] 

Gopal V commented on HIVE-15573:
--------------------------------

[~mmccline]: LGTM -  +1 tests pending. 

Nits on the LOG.debug(), wrap the ones which do Arrays. calls with an isDebugEnabled.

There needs to be a guard-rail to check the 2 enums together, in one place. Not all combinations
of {{BucketNumKind}} x {{PartitionHashCodeKind PartitionHashCodeKind}} matrix are valid.

Also final variables in the loop are very useful to catch issues ahead of time - moving these
into the loop + finals, means the compiler ensures no left over state from a previous row
& that all branches perform assignments to all variables.

{code}
+      int batchIndex;
+      int bucketNum;
+      int hashCode;
+      int keyLength;
{code}

> Vectorization: ACID shuffle ReduceSink is not specialized 
> ----------------------------------------------------------
>
>                 Key: HIVE-15573
>                 URL: https://issues.apache.org/jira/browse/HIVE-15573
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions, Vectorization
>    Affects Versions: 2.2.0
>            Reporter: Gopal V
>            Assignee: Matt McCline
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15573.01.patch, HIVE-15573.02.patch, HIVE-15573.03.patch, screenshot-1.png
>
>
> The ACID shuffle disabled murmur hash for the shuffle, due to the bucketing requirements
demanding the writable hashcode for the shuffles.
> {code}
>     boolean useUniformHash = desc.getReducerTraits().contains(UNIFORM);
>     if (!useUniformHash) {
>       return false;
>     }
> {code}
> This check protects the fast ReduceSink ops from being used in ACID inserts.
> A specialized case for the following pattern will make ACID insert much faster.
> {code}
>                     Reduce Output Operator
>                       sort order: 
>                       Map-reduce partition columns: _col0 (type: bigint)
>                       value expressions:  ....
> {code}
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message