hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21531) Vectorization: all NULL hashcodes are not computed using Murmur3
Date Wed, 24 Apr 2019 04:53:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824816#comment-16824816
] 

Jesus Camacho Rodriguez commented on HIVE-21531:
------------------------------------------------

[~ashutosh.bapat], [~sankarh], the two tests above continue timing out repeatedly, you can
check the ptest server.
Going through the logs, I have realized that they are not batched with other tests due to
HIVE-21109, which seems a step in the right direction. However, this does not seem sufficient
to avoid the timeout.
Could you disable them in the meantime and enable them back once you have fixed the issue
(I guess a possible option may be splitting / rewriting part of those two tests)? Currently,
we cannot check in anything into master.

> Vectorization: all NULL hashcodes are not computed using Murmur3
> ----------------------------------------------------------------
>
>                 Key: HIVE-21531
>                 URL: https://issues.apache.org/jira/browse/HIVE-21531
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 3.1.1
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Critical
>         Attachments: HIVE-21531.1.patch, HIVE-21531.1.patch, HIVE-21531.2.patch, HIVE-21531.WIP.patch
>
>
> The comments in Vectorized hash computation call out the MurmurHash implementation (the
one using 0x5bd1e995), while the non-vectorized codepath calls out the Murmur3 one (using
0xcc9e2d51).
> The comments here are wrong
> {code}
>  /**
>    * Batch compute the hash codes for all the serialized keys.
>    *
>    * NOTE: MAJOR MAJOR ASSUMPTION:
>    *     We assume that HashCodeUtil.murmurHash produces the same result
>    *     as MurmurHash.hash with seed = 0 (the method used by ReduceSinkOperator for
>    *     UNIFORM distribution).
>    */
>   protected void computeSerializedHashCodes() {
>     int offset = 0;
>     int keyLength;
>     byte[] bytes = output.getData();
>     for (int i = 0; i < nonNullKeyCount; i++) {
>       keyLength = serializedKeyLengths[i];
>       hashCodes[i] = Murmur3.hash32(bytes, offset, keyLength, 0);
>       offset += keyLength;
>     }
>   }
> {code}
> but the wrong comment is followed in the Vector RS operator 
> {code}
>       System.arraycopy(nullKeyOutput.getData(), 0, nullBytes, 0, nullBytesLength);
>       nullKeyHashCode = HashCodeUtil.calculateBytesHashCode(nullBytes, 0, nullBytesLength);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message