drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2402) Current method of combining hash values can produce skew
Date Fri, 06 Mar 2015 23:57:38 GMT
Aman Sinha created DRILL-2402:
---------------------------------

             Summary: Current method of combining hash values can produce skew
                 Key: DRILL-2402
                 URL: https://issues.apache.org/jira/browse/DRILL-2402
             Project: Apache Drill
          Issue Type: Improvement
          Components: Functions - Drill
    Affects Versions: 0.8.0
            Reporter: Aman Sinha
            Assignee: Jacques Nadeau


The current method of combining hash values of multiple columns can produce skew in some cases
even though each individual hash function does not produce skew.  The combining function is
XOR: 
{code}
   hash(a, b) = XOR (hash(a), hash(b))
{code}
The above result will be 0 for all  rows where a = b, so hash(a) = hash(b).  This will clearly
create severe skew and affects the performance of queries that do HashAggregate based group-by
on {a, b} or a HashJoin .on both columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message