hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-871) Improve distribution of keys in reduce phase
Date Fri, 03 Jul 2009 10:19:47 GMT
Improve distribution of keys in reduce phase
--------------------------------------------

                 Key: PIG-871
                 URL: https://issues.apache.org/jira/browse/PIG-871
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.3.0
            Reporter: Ankur


The default hashing scheme used to distribute keys in reduce phase sometimes results in an
uneven distribution of keys resulting in 5 - 10 % of reducers being overloaded with data.
This bottleneck makes the PIG jobs really slow and gives users a bad impression.

While there is no bullet proof solution to the problem in general, the hashing can certainly
be improved for better distribution. The proposal here is to evaluate and incorporate other
hashing schemes that give high avalanche and more even distribution. We can start by evaluating
MurmurHash which is Apache 2.0 licensed and freely available here - http://www.getopt.org/murmur/MurmurHash.java


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message