drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zelaine Fong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data
Date Tue, 24 Nov 2015 17:07:11 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024866#comment-15024866
] 

Zelaine Fong commented on DRILL-4119:
-------------------------------------

[~amansinha100] - when you say "the original XXHash's C implementation and based on an initial
analysis that one produces different hash value than our implementation and does not seem
to have the same 'even number' pattern", are you saying that our current hash64 implementation
is different from the original one?  If yes, does that mean something got lost in the translation
from C to Java?

> Skew in hash distribution for varchar (and possibly other) types of data
> ------------------------------------------------------------------------
>
>                 Key: DRILL-4119
>                 URL: https://issues.apache.org/jira/browse/DRILL-4119
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.3.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>             Fix For: 1.4.0
>
>
> We are seeing substantial skew for an Id column that contains varchar data of length
32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02          HashAgg(group=[{0}])
> 01-03            Project(SomeId=[$0])
> 01-04              HashToRandomExchange(dist0=[[$0]])
> 02-01                UnorderedMuxExchange
> 03-01                  Project(SomeId=[$0], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02                    HashAgg(group=[{0}])
> 03-03                      Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message