drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parth Chandra (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2402) Current method of combining hash values can produce skew
Date Mon, 09 Mar 2015 16:28:38 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Parth Chandra updated DRILL-2402:
---------------------------------
    Fix Version/s: 0.9.0

> Current method of combining hash values can produce skew
> --------------------------------------------------------
>
>                 Key: DRILL-2402
>                 URL: https://issues.apache.org/jira/browse/DRILL-2402
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>    Affects Versions: 0.8.0
>            Reporter: Aman Sinha
>            Assignee: Jacques Nadeau
>             Fix For: 0.9.0
>
>
> The current method of combining hash values of multiple columns can produce skew in some
cases even though each individual hash function does not produce skew.  The combining function
is XOR: 
> {code}
>    hash(a, b) = XOR (hash(a), hash(b))
> {code}
> The above result will be 0 for all  rows where a = b, so hash(a) = hash(b).  This will
clearly create severe skew and affects the performance of queries that do HashAggregate based
group-by on {a, b} or a HashJoin .on both columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message